ITCH: information-theoretic cluster hierarchies

Authors:
Christian Böhm;Frank Fiedler;Annahita Oswald;Claudia Plant;Bianca Wackersreuther;Peter Wackersreuther
Affiliations:
University of Munich, Munich, Germany;University of Munich, Munich, Germany;University of Munich, Munich, Germany;Florida State University, Tallahassee, FL;University of Munich, Munich, Germany;University of Munich, Munich, Germany
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Year:
2010

Citing 13
Cited 1

Algorithms for clustering data

Algorithms for clustering data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Learning mixture hierarchies

Proceedings of the 1998 conference on Advances in neural information processing systems II
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Unsupervised Image Classification with a Hierarchical EM Algorithm

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
How Many Clusters? An Information-Theoretic Perspective

Neural Computation
Robust information-theoretic clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Information and Complexity in Statistical Modeling

Information and Complexity in Statistical Modeling
Clustering by compression

IEEE Transactions on Information Theory

Genetic algorithm for finding cluster hierarchies

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical clustering methods are widely used in various scientific domains such as molecular biology, medicine, economy, etc. Despite the maturity of the research field of hierarchical clustering, we have identified the following four goals which are not yet fully satisfied by previous methods: First, to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. Second, to represent each cluster in the hierarchy by an intuitive description with e.g. a probability density function. Third, to consistently handle outliers. And finally, to avoid difficult parameter settings.With ITCH, we propose a novel clustering method that is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL), referred to as hMDL. Interpreting the hierarchical cluster structure as a statistical model of the data set, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate induces a natural objective function for clustering, which automatically satisfies all four above mentioned goals.