Algorithms for clustering data
Algorithms for clustering data
Elements of information theory
Elements of information theory
Quantitative methods of evaluating image segmentation
ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3
Non-redundant clustering with conditional ensembles
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Modeling user interests by conceptual clustering
Information Systems - Special issue: The semantic web and web services
An efficient approach to external cluster assessment with an application to Martian topography
Data Mining and Knowledge Discovery
Modeling user interests by conceptual clustering
Information Systems
Scalable clustering of news search results
Proceedings of the fourth ACM international conference on Web search and data mining
Genetic algorithm for finding cluster hierarchies
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Integrative parameter-free clustering of data with mixed type attributes
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Hi-index | 0.00 |
In this paper we propose a measure of similarity/ association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by somehow comparing the clusters they produce with "ground truth" consisting of classes assigned by manual means or some other means in whose veracity there is confidence. Such measures are referred to as "external". Our measure also allows clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels are as predictors of their class labels. It computes the reduction in the number of bits that would be required to encode (comress) the class labels if both the encoder and decoder have free access to the cluster labels. To achieve this encoding the estimated conditional probabilities of the class labels given the cluster labels must also be encoded. In addition to defining the measure we compare it to other commonly used external measures and demonstrate its superiority as judged by certain criteria.