Clustering under approximation stability
Journal of the ACM (JACM)
An indication of unification for different clustering approaches
Pattern Recognition
Interpretable clustering using unsupervised binary trees
Advances in Data Analysis and Classification
Ensemble clustering by means of clustering embedding in vector spaces
Pattern Recognition
Behavior-based clustering and analysis of interestingness measures for association rule mining
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
In comparing clusterings, several different distances and indices are in use. We prove that the Misclassification Error distance, the Hamming distance (equivalent to the unadjusted Rand index), and the 驴2 distance between partitions are equivalent in the neighborhood of 0. In other words, if two partitions are very similar, then one distance defines upper and lower bounds on the other and viceversa. The proofs are geometric and rely on the concavity of the distances. The geometric intuitions themselves advance the understanding of the space of all clusterings. To our knowledge, this is the first result of its kind.Practically, distances are frequently used to compare two clusterings of a set of observations. But the motivation for this work is in the theoretical study of data clustering. Distances between partitions are involved in constructing new methods for cluster validation, determining the number of clusters, and analyzing clustering algorithms. From a probability theory point of view, the present results apply to any pair of finite valued random variables, and provide simple yet tight upper and lower bounds on the 驴2 measure of (in)dependence valid when the two variables are strongly dependent.