Generalization-based data mining in object-oriented databases using an object cube model
Data & Knowledge Engineering - Special jubilee issue: DKE 25
Data Mining with optimized two-dimensional association rules
ACM Transactions on Database Systems (TODS)
Efficient discovery of error-tolerant frequent itemsets in high dimensions
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
Discovery-Driven Exploration of OLAP Data Cubes
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Automatic Subspace Clustering of High Dimensional Data
Data Mining and Knowledge Discovery
Quantitative evaluation of approximate frequent pattern mining algorithms
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
PUB: A Class Description Technique Based on Partial Coverage of Subspace
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Big data, big business: bridging the gap
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Hi-index | 12.06 |
A good description of a class should be (reasonably) accurate and interpretable. Previous works address this class-description problem by either analyzing the correlation of each attribute with the class, or by producing rules as in building a classifier. These solutions suffer from issues in accuracy and interpretability. A sentence is usually defined as a disjunction or conjunction of several terms, each of which specifies a constraint (range/set of values) on an attribute. From the data analysis point of view, a sentence specifies a subspace in the database. In this paper, we create a richer yet interpretable form of a sentence. Here, a sentence describes an object if any k attributes of that object satisfy the specified constraints, or in other words, the object is partially covered by the subspace. Since this simple enhancement subsumes rules used in previous solutions, descriptions based on such sentences are provably better. To that end, we design Pub, an algorithm that produces descriptions with our form of sentences. Theoretically, while constructing a sentence (within the description), Pub finds the optimal range/set of values for each attribute in linear time. Empirically, we show that Pub is efficient, and able to produce more accurate, concise and interpretable descriptions than current approaches on various real datasets. We also perform an illustrative case study on the Glass dataset, providing some useful insights.