An information-theoretic external cluster-validity measure

  • Authors:
  • Byron E. Dom

  • Affiliations:
  • IBM Research Division, San Jose, CA

  • Venue:
  • UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a measure of similarity/ association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by somehow comparing the clusters they produce with "ground truth" consisting of classes assigned by manual means or some other means in whose veracity there is confidence. Such measures are referred to as "external". Our measure also allows clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels are as predictors of their class labels. It computes the reduction in the number of bits that would be required to encode (comress) the class labels if both the encoder and decoder have free access to the cluster labels. To achieve this encoding the estimated conditional probabilities of the class labels given the cluster labels must also be encoded. In addition to defining the measure we compare it to other commonly used external measures and demonstrate its superiority as judged by certain criteria.