An introduction to Kolmogorov complexity and its applications
An introduction to Kolmogorov complexity and its applications
Discovering Frequent Closed Itemsets for Association Rules
ICDT '99 Proceedings of the 7th International Conference on Database Theory
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Fully automatic cross-associations
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
SUMMARY: Efficiently Summarizing Transactions for Clustering
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)
Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)
Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Journal of Biomedical Informatics
Summarizing itemset patterns: a profile-based approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Summarization — Compressing Data into an Informative Representation
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Maximally informative k-itemsets and their efficient discovery
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns using probabilistic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Data Mining and Knowledge Discovery
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
Finding low-entropy sets and trees from binary data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Assessing data mining results via swap randomization
ACM Transactions on Knowledge Discovery from Data (TKDD)
Information and Complexity in Statistical Modeling
Information and Complexity in Statistical Modeling
The Chosen Few: On Identifying Valuable Patterns
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Preserving Privacy through Data Generation
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Tell me something I don't know: randomization strategies for iterative data mining
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarising data by clustering items
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
Banded structure in binary matrices
Knowledge and Information Systems
Tell me what i need to know: succinctly summarizing data with itemsets
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum entropy models and subjective interestingness: an application to tiles in binary databases
Data Mining and Knowledge Discovery
A bi-clustering framework for categorical data
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Paper: Modeling by shortest data description
Automatica (Journal of IFAC)
Kolmogorov's structure functions and model selection
IEEE Transactions on Information Theory
Fast and reliable anomaly detection in categorical data
Proceedings of the 21st ACM international conference on Information and knowledge management
Summarizing clinical pathways from event logs
Journal of Biomedical Informatics
Hi-index | 0.00 |
For a book, its title and abstract provide a good first impression of what to expect from it. For a database, obtaining a good first impression is typically not so straightforward. While low-order statistics only provide very limited insight, downright mining the data rapidly provides too much detail for such a quick glance. In this paper we propose a middle ground, and introduce a parameter-free method for constructing high-quality descriptive summaries of binary and categorical data. Our approach builds a summary by clustering attributes that strongly correlate, and uses the Minimum Description Length principle to identify the best clustering--without requiring a distance measure between attributes. Besides providing a practical overview of which attributes interact most strongly, these summaries can also be used as surrogates for the data, and can easily be queried. Extensive experimentation shows that our method discovers high-quality results: correlated attributes are correctly grouped, which is verified both objectively and subjectively. Our models can also be employed as surrogates for the data; as an example of this we show that we can quickly and accurately query the estimated supports of frequent generalized itemsets.