Adaptive Sampling for k-Means Clustering
APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences
ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
TAMC'11 Proceedings of the 8th annual conference on Theory and applications of models of computation
Settling the complexity of local max-cut (almost) completely
ICALP'11 Proceedings of the 38th international colloquim conference on Automata, languages and programming - Volume Part I
Smoothed Analysis of the k-Means Method
Journal of the ACM (JACM)
Smoothed analysis of partitioning algorithms for Euclidean functionals
WADS'11 Proceedings of the 12th international conference on Algorithms and data structures
Granular-based partial periodic pattern discovery over time series data
RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Measuring query privacy in location-based services
Proceedings of the second ACM conference on Data and Application Security and Privacy
Bregman clustering for separable instances
SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
Using Clustering and Metric Learning to Improve Science Return of Remote Sensed Imagery
ACM Transactions on Intelligent Systems and Technology (TIST)
StreamKM++: A clustering algorithm for data streams
Journal of Experimental Algorithmics (JEA)
SCOUT: prefetching for latent structure following queries
Proceedings of the VLDB Endowment
The MADlib analytics library: or MAD skills, the SQL
Proceedings of the VLDB Endowment
A modification of the k-means method for quasi-unsupervised learning
Knowledge-Based Systems
A framework for evaluating the smoothness of data-mining results
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Fuzzy regularized generalized eigenvalue classifier with a novel membership function
Information Sciences: an International Journal
Hi-index | 0.00 |
The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/sigma, where sigma is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.