k-Means Has Polynomial Smoothed Complexity

Authors:
David Arthur;Bodo Manthey;Heiko Röglin
Affiliations:
-;-;-
Venue:
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Year:
2009

Citing 0
Cited 16

Adaptive Sampling for k-Means Clustering

APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
A bad instance for k-means++

TAMC'11 Proceedings of the 8th annual conference on Theory and applications of models of computation
Settling the complexity of local max-cut (almost) completely

ICALP'11 Proceedings of the 38th international colloquim conference on Automata, languages and programming - Volume Part I
Smoothed Analysis of the k-Means Method

Journal of the ACM (JACM)
Smoothed analysis of partitioning algorithms for Euclidean functionals

WADS'11 Proceedings of the 12th international conference on Algorithms and data structures
Granular-based partial periodic pattern discovery over time series data

RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Measuring query privacy in location-based services

Proceedings of the second ACM conference on Data and Application Security and Privacy
Bregman clustering for separable instances

SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
Using Clustering and Metric Learning to Improve Science Return of Remote Sensed Imagery

ACM Transactions on Intelligent Systems and Technology (TIST)
StreamKM++: A clustering algorithm for data streams

Journal of Experimental Algorithmics (JEA)
SCOUT: prefetching for latent structure following queries

Proceedings of the VLDB Endowment
The MADlib analytics library: or MAD skills, the SQL

Proceedings of the VLDB Endowment
A modification of the k-means method for quasi-unsupervised learning

Knowledge-Based Systems
A framework for evaluating the smoothness of data-mining results

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Fuzzy regularized generalized eigenvalue classifier with a novel membership function

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/sigma, where sigma is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.