Elements of information theory
Elements of information theory
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
The Journal of Machine Learning Research
Comparing clusterings: an axiomatic view
ICML '05 Proceedings of the 22nd international conference on Machine learning
BIBE '09 Proceedings of the 2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
The Journal of Machine Learning Research
Applications of graph theory to an English rhyming corpus
Computer Speech and Language
A methodology to find clusters in the data based on Shannon's entropy and genetic algorithms
ACELAE'11 Proceedings of the 10th WSEAS international conference on communications, electrical & computer engineering, and 9th WSEAS international conference on Applied electromagnetics, wireless and optical communications
A site oriented method for segmenting web pages
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
INCONCO: interpretable clustering of numerical and categorical objects
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A novel similarity measure for fiber clustering using longest common subsequence
Proceedings of the 2011 workshop on Data mining for medicine and healthcare
Using a Wikipedia-based semantic relatedness measure for document clustering
TextGraphs-6 Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
Clustering for semi-supervised spam filtering
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
Who wrote this code? identifying the authors of program binaries
ESORICS'11 Proceedings of the 16th European conference on Research in computer security
Generalized Adjusted Rand Indices for cluster ensembles
Pattern Recognition
Hierarchical verb clustering using graph factorization
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using Clustering and Metric Learning to Improve Science Return of Remote Sensed Imagery
ACM Transactions on Intelligent Systems and Technology (TIST)
Dynamic bayesian network modeling of cyanobacterial biological processes via gene clustering
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
A set correlation model for partitional clustering
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Clustering of heterogeneously typed data with soft computing - a case study
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Dependency clustering across measurement scales
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarization-based mining bipartite graphs
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering through SOM consistency
ICIAR'12 Proceedings of the 9th international conference on Image Analysis and Recognition - Volume Part I
A heuristic for non-convex variance-based clustering criteria
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
ESC: An efficient synchronization-based clustering algorithm
Knowledge-Based Systems
High order pLSA for indexing tagged images
Signal Processing
Effective measures for inter-document similarity
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Alternate views of graph clusterings based on thresholds: a case study for a student forum
Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
Computers in Biology and Medicine
Enhancing K-Means using class labels
Intelligent Data Analysis
Hi-index | 0.00 |
Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. We observe that the baseline for such measures, i.e. average value between random partitions of a data set, does not take on a constant value, and tends to have larger variation when the ratio between the number of data points and the number of clusters is small. This effect is similar in some other non-information theoretic based measures such as the well-known Rand Index. Assuming a hypergeometric model of randomness, we derive the analytical formula for the expected mutual information value between a pair of clusterings, and then propose the adjusted version for several popular information theoretic based measures. Some examples are given to demonstrate the need and usefulness of the adjusted measures.