Correcting Jaccard and other similarity indices for chance agreement in cluster analysis

  • Authors:
  • Ahmed N. Albatineh;Magdalena Niewiadomska-Bugaj

  • Affiliations:
  • Department of Epidemiology and Biostatistics, Florida International University, Miami, USA;Department of Statistics, Western Michigan University, Kalamazoo, USA

  • Venue:
  • Advances in Data Analysis and Classification
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Correcting a similarity index for chance agreement requires computing its expectation under fixed marginal totals of a matching counts matrix. For some indices, such as Jaccard, Rogers and Tanimoto, Sokal and Sneath, and Gower and Legendre the expectations cannot be easily found. We show how such similarity indices can be expressed as functions of other indices and expectations found by approximations such that approximate correction is possible. A second approach is based on Taylor series expansion. A simulation study illustrates the effectiveness of the resulting correction of similarity indices using structured and unstructured data generated from bivariate normal distributions.