Stability of k-means clustering

Authors:
Shai Ben-David;Dávid Pál;Hans Ulrich Simon
Affiliations:
David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;Ruhr-Universität Bochum, Germany
Venue:
COLT'07 Proceedings of the 20th annual conference on Learning theory
Year:
2007

Citing 2
Cited 8

Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation
A sober look at clustering stability

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Multi-core parallelization in Clojure: a case study

Proceedings of the 6th European Lisp Workshop
Clustering Stability: An Overview

Foundations and Trends® in Machine Learning
Center-based clustering under perturbation stability

Information Processing Letters
What Can We Learn Privately?

SIAM Journal on Computing
RSQRT: An heuristic for estimating the number of clusters to report

Electronic Commerce Research and Applications
Stratified k-means clustering over a deep web data source

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing Clustering Techniques for Real Microarray Data

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Maximum volume clustering: a new discriminative clustering approach

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the stability of k-means clustering problems. Clustering stability is a common heuristics used to determine the number of clusters in a wide variety of clustering applications. We continue the theoretical analysis of clustering stability by establishing a complete characterization of clustering stability in terms of the number of optimal solutions to the clustering optimization problem. Our results complement earlier work of Ben-David, von Luxburg and Pál, by settling the main problem left open there. Our analysis shows that, for probability distributions with finite support, the stability of k-means clusterings depends solely on the number of optimal solutions to the underlying optimization problem for the data distribution. These results challenge the common belief and practice that view stability as an indicator of the validity, or meaningfulness, of the choice of a clustering algorithm and number of clusters.