Identifying the components

Authors:
Matthijs Leeuwen;Jilles Vreeken;Arno Siebes
Affiliations:
Department of Computer Science, Universiteit Utrecht, Utrecht, The Netherlands;Department of Computer Science, Universiteit Utrecht, Utrecht, The Netherlands;Department of Computer Science, Universiteit Utrecht, Utrecht, The Netherlands
Venue:
Data Mining and Knowledge Discovery
Year:
2009

Citing 15
Cited 5

An introduction to Kolmogorov complexity and its applications

An introduction to Kolmogorov complexity and its applications
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Information Theoretic Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Finding Localized Associations in Market Basket Data

IEEE Transactions on Knowledge and Data Engineering
Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
Robust information-theoretic clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On data mining, compression, and Kolmogorov complexity

Data Mining and Knowledge Discovery
Characterising the difference

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Compression picks item sets that matter

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A bi-clustering framework for categorical data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Clustering by compression

IEEE Transactions on Information Theory

Guest editors' introduction: special issue of selected papers from ECML PKDD 2009

Data Mining and Knowledge Discovery
Guest editors' introduction: Special Issue from ECML PKDD 2009

Machine Learning
Identifying the Components

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Making pattern mining useful

ACM SIGKDD Explorations Newsletter
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most, if not all, databases are mixtures of samples from different distributions. Transactional data is no exception. For the prototypical example, supermarket basket analysis, one also expects a mixture of different buying patterns. Households of retired people buy different collections of items than households with young children. Models that take such underlying distributions into account are in general superior to those that do not. In this paper we introduce two MDL-based algorithms that follow orthogonal approaches to identify the components in a transaction database. The first follows a model-based approach, while the second is data-driven. Both are parameter-free: the number of components and the components themselves are chosen such that the combined complexity of data and models is minimised. Further, neither prior knowledge on the distributions nor a distance metric on the data is required. Experiments with both methods show that highly characteristic components are identified.