Smoothed Analysis of the k-Means Method

Authors:
David Arthur;Bodo Manthey;Heiko Röglin
Affiliations:
Stanford University;University of Twente;University of Bonn
Venue:
Journal of the ACM (JACM)
Year:
2011

Citing 26
Cited 6

Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time

Journal of the ACM (JACM)
A local search approximation algorithm for k-means clustering

Computational Geometry: Theory and Applications - Special issue on the 18th annual symposium on computational geometry—SoCG2002
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Random knapsack in expected polynomial time

Journal of Computer and System Sciences - Special issue: STOC 2003
How Fast Is the k-Means Method?

Algorithmica
How slow is the k-means method?

Proceedings of the twenty-second annual symposium on Computational geometry
The Effectiveness of Lloyd-Type Methods for the k-Means Problem

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Clustering with Bregman Divergences

The Journal of Machine Learning Research
Average-Case and Smoothed Competitive Analysis of the Multilevel Feedback Algorithm

Mathematics of Operations Research
K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods

IEEE Transactions on Knowledge and Data Engineering
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Worst case and probabilistic analysis of the 2-Opt algorithm for the TSP: extended abstract

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Improved smoothed analysis of the k-means method

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Coresets and approximate clustering for Bregman divergences

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The Planar k-Means Problem is NP-Hard

WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
NP-hardness of Euclidean sum-of-squares clustering

Machine Learning
Beyond Hirsch Conjecture: Walks on Random Polytopes and Smoothed Complexity of the Simplex Method

SIAM Journal on Computing
Worst-Case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-Means Method

SIAM Journal on Computing
Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
k-Means Has Polynomial Smoothed Complexity

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Clustering for metric and nonmetric distance measures

ACM Transactions on Algorithms (TALG)
k-means Requires Exponentially Many Iterations Even in the Plane

Discrete & Computational Geometry - Special Issue: 25th Annual Symposium on Computational Geometry; Guest Editor: John Hershberger

Smoothed complexity theory

MFCS'12 Proceedings of the 37th international conference on Mathematical Foundations of Computer Science
The effectiveness of lloyd-type methods for the k-means problem

Journal of the ACM (JACM)
Nonlinear multicriteria clustering based on multiple dissimilarity matrices

Pattern Recognition
Optimising sum-of-squares measures for clustering multisets defined over a metric space

Discrete Applied Mathematics
A bad instance for k-means++

Theoretical Computer Science
Scalable K-Means by ranked retrieval

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this article, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/σ, where σ is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.