Dissimilarity and similarity measures for comparing dendrograms and their applications

  • Authors:
  • Isabella Morlini;Sergio Zani

  • Affiliations:
  • Department of Economics, University of Modena and Reggio Emilia, Reggio Emilia, Italy;Department of Economics, University of Parma, Parma, Italy

  • Venue:
  • Advances in Data Analysis and Classification
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a new index Z for measuring the dissimilarity between two hierarchical clusterings (or dendrograms). This index is a metric since it satisfies the axioms of non-negativity, symmetry and triangle inequality. A desirable property of this index is that it can be decomposed into the contributions pertaining to each stage of the hierarchies. We show the relations of such components with the currently used criteria for comparing two partitions. We obtain a global similarity index as the complement to one of the suggested dissimilarity and we derive its adjustment for agreement due to chance. We obtain similarity indexes pertaining to each stage of the hierarchies as the complement to one of the additive parts of the global distance Z. We consider the use of the proposed distance for more than two dendrograms and its use for the consensus of classifications and variable selection in cluster analysis. A series of simulation experiments and an application to a real data set are presented.