Elements of information theory
Elements of information theory
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Clustering with Qualitative Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
The Journal of Machine Learning Research
Comparing clusterings: an axiomatic view
ICML '05 Proceedings of the 22nd international conference on Machine learning
Comparing clusterings---an information based distance
Journal of Multivariate Analysis
k-ANMI: A mutual information based clustering algorithm for categorical data
Information Fusion
Ensemble clustering with voting active clusters
Pattern Recognition Letters
Information theoretic measures for clusterings comparison: is a correction for chance necessary?
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Adapting the right measures for K-means clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Information-Theoretic Distance Measures for Clustering Validation: Generalization and Normalization
IEEE Transactions on Knowledge and Data Engineering
BIBE '09 Proceedings of the 2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering
Ensemble clustering using semidefinite programming with applications
Machine Learning
A sober look at clustering stability
COLT'06 Proceedings of the 19th annual conference on Learning Theory
IEEE Transactions on Information Theory
Clustering geo-tagged photo collections using dynamic programming
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Dynamic bayesian network modeling of cyanobacterial biological processes via gene clustering
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Enhancing search result clustering with semantic indexing
Proceedings of the Third Symposium on Information and Communication Technology
Data discretization for dynamic bayesian network based modeling of genetic networks
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Relative Validity Criteria for Community Mining Algorithms
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Alternate views of graph clusterings based on thresholds: a case study for a student forum
Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
On the statistical detection of clusters in undirected networks
Computational Statistics & Data Analysis
Adaptive thresholding in structure learning of a Bayesian network
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
A statistical view of clustering performance through the theory of U-processes
Journal of Multivariate Analysis
A study of K-Means-based algorithms for constrained clustering
Intelligent Data Analysis
Enhancing K-Means using class labels
Intelligent Data Analysis
Hi-index | 0.00 |
Information theoretic measures form a fundamental class of measures for comparing clusterings, and have recently received increasing interest. Nevertheless, a number of questions concerning their properties and inter-relationships remain unresolved. In this paper, we perform an organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones. We discuss and prove their important properties, such as the metric property and the normalization property. We then highlight to the clustering community the importance of correcting information theoretic measures for chance, especially when the data size is small compared to the number of clusters present therein. Of the available information theoretic based measures, we advocate the normalized information distance (NID) as a general measure of choice, for it possesses concurrently several important properties, such as being both a metric and a normalized measure, admitting an exact analytical adjusted-for-chance form, and using the nominal [0,1] range better than other normalized variants.