An information-theoretic external cluster-validity measure

Authors:
Byron E. Dom
Affiliations:
IBM Research Division, San Jose, CA
Venue:
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Year:
2002

Citing 3
Cited 8

Algorithms for clustering data

Algorithms for clustering data
Elements of information theory

Elements of information theory
Quantitative methods of evaluating image segmentation

ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3

Non-redundant clustering with conditional ensembles

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Modeling user interests by conceptual clustering

Information Systems - Special issue: The semantic web and web services
A conceptual clustering approach for user profiling in personal information agents

AI Communications
An efficient approach to external cluster assessment with an application to Martian topography

Data Mining and Knowledge Discovery
Modeling user interests by conceptual clustering

Information Systems
Scalable clustering of news search results

Proceedings of the fourth ACM international conference on Web search and data mining
Genetic algorithm for finding cluster hierarchies

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Integrative parameter-free clustering of data with mixed type attributes

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a measure of similarity/ association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by somehow comparing the clusters they produce with "ground truth" consisting of classes assigned by manual means or some other means in whose veracity there is confidence. Such measures are referred to as "external". Our measure also allows clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels are as predictors of their class labels. It computes the reduction in the number of bits that would be required to encode (comress) the class labels if both the encoder and decoder have free access to the cluster labels. To achieve this encoding the estimated conditional probabilities of the class labels given the cluster labels must also be encoded. In addition to defining the measure we compare it to other commonly used external measures and demonstrate its superiority as judged by certain criteria.