Improved smoothed analysis of the k-means method

Authors:
Bodo Manthey;Heiko Röglin
Affiliations:
Saarland University;Boston University
Venue:
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Year:
2009

Citing 7
Cited 7

Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time

Journal of the ACM (JACM)
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
How Fast Is the k-Means Method?

Algorithmica
How slow is the k-means method?

Proceedings of the twenty-second annual symposium on Computational geometry
Worst-case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-means Method

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science

k-means requires exponentially many iterations even in the plane

Proceedings of the twenty-fifth annual symposium on Computational geometry
Smoothed analysis: an attempt to explain the behavior of algorithms in practice

Communications of the ACM - A View of Parallel Computing
Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
A clustering based approach for skyline diversity

Expert Systems with Applications: An International Journal
Smoothed Analysis of the k-Means Method

Journal of the ACM (JACM)
Application of a clustering method on sentiment analysis

Journal of Information Science
Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions

Applied Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii [3] aimed at closing this gap, and they proved a bound of poly(nk, σ−1) on the smoothed running-time of the k-means method, where n is the number of data points and σ is the standard deviation of the Gaussian perturbation. This bound, though better than the worst-case bound, is still much larger than the running-time observed in practice. We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in n√k and σ−1. Second, we prove an upper bound of kkd·poly(n, σ−1), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k, d ∈ O(√logn/log logn). Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.