Class description using partial coverage of subspaces

Authors:
Ardian Kristanto Poernomo;Vivekanand Gopalkrishnan
Affiliations:
Nanyang Technological University, Singapore;Nanyang Technological University, Singapore
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 8
Cited 1

Generalization-based data mining in object-oriented databases using an object cube model

Data & Knowledge Engineering - Special jubilee issue: DKE 25
Data Mining with optimized two-dimensional association rules

ACM Transactions on Database Systems (TODS)
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Automatic Subspace Clustering of High Dimensional Data

Data Mining and Knowledge Discovery
Quantitative evaluation of approximate frequent pattern mining algorithms

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
PUB: A Class Description Technique Based on Partial Coverage of Subspace

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining

Big data, big business: bridging the gap

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

Quantified Score

Hi-index	12.06

Visualization

Abstract

A good description of a class should be (reasonably) accurate and interpretable. Previous works address this class-description problem by either analyzing the correlation of each attribute with the class, or by producing rules as in building a classifier. These solutions suffer from issues in accuracy and interpretability. A sentence is usually defined as a disjunction or conjunction of several terms, each of which specifies a constraint (range/set of values) on an attribute. From the data analysis point of view, a sentence specifies a subspace in the database. In this paper, we create a richer yet interpretable form of a sentence. Here, a sentence describes an object if any k attributes of that object satisfy the specified constraints, or in other words, the object is partially covered by the subspace. Since this simple enhancement subsumes rules used in previous solutions, descriptions based on such sentences are provably better. To that end, we design Pub, an algorithm that produces descriptions with our form of sentences. Theoretically, while constructing a sentence (within the description), Pub finds the optimal range/set of values for each attribute in linear time. Empirically, we show that Pub is efficient, and able to produce more accurate, concise and interpretable descriptions than current approaches on various real datasets. We also perform an illustrative case study on the Glass dataset, providing some useful insights.