Correcting Jaccard and other similarity indices for chance agreement in cluster analysis

Authors:
Ahmed N. Albatineh;Magdalena Niewiadomska-Bugaj
Affiliations:
Department of Epidemiology and Biostatistics, Florida International University, Miami, USA;Department of Statistics, Western Michigan University, Kalamazoo, USA
Venue:
Advances in Data Analysis and Classification
Year:
2011

Citing 3
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Comparison of Chernoff-type face and non-graphical methods for clustering multivariate observations

Computational Statistics & Data Analysis
MCS: A Method for Finding the Number of Clusters

Journal of Classification

Dissimilarity and similarity measures for comparing dendrograms and their applications

Advances in Data Analysis and Classification

Quantified Score

Hi-index	0.00

Visualization

Abstract

Correcting a similarity index for chance agreement requires computing its expectation under fixed marginal totals of a matching counts matrix. For some indices, such as Jaccard, Rogers and Tanimoto, Sokal and Sneath, and Gower and Legendre the expectations cannot be easily found. We show how such similarity indices can be expressed as functions of other indices and expectations found by approximations such that approximate correction is possible. A second approach is based on Taylor series expansion. A simulation study illustrates the effectiveness of the resulting correction of similarity indices using structured and unstructured data generated from bivariate normal distributions.