PUB: A Class Description Technique Based on Partial Coverage of Subspace

  • Authors:
  • Ardian Kristanto Poernomo;Vivekanand Gopalkrishnan

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A good description of a class should be accurate and interpretable. Previous works describe classes either by analyzing the correlation of each attribute with the class, or by producing rules as in building a classifier. These solutions suffer from issues in accuracy and interpretability. A description naturally consists of sentences, where each sentence consists of a set of terms. Normally, a sentence is defined as a disjunction or conjunction of several terms, each of which specifies a constraint (range/set of values) on an attribute. From the data analysis point of view, a sentence specifies a subspace in the database. In this paper, we create a richer yet interpretable form of a sentence, i.e., a sentence describes an object if any $k$ attributes of that object satisfy the specified constraints. To that end, we design \textsc{Pub}, an algorithm that produces descriptions with our form of sentences. While constructing a sentence (within the description), \textsc{Pub} finds the optimal range/set of values for each attribute in linear time. We also empirically show that \textsc{Pub} is efficient, and able to produce more accurate, concise and interpretable descriptions than current approaches on various real datasets.