The effectiveness of lloyd-type methods for the k-means problem

Authors:
Rafail Ostrovsky;Yuval Rabani;Leonard J. Schulman;Chaitanya Swamy
Affiliations:
University of California, Los Angeles, CA;The Hebrew University of Jerusalem, Jerusalem, Israel;California Institute of Technology, Pasadena, CA;University of Waterloo, Waterloo, Canada
Venue:
Journal of the ACM (JACM)
Year:
2013

Citing 41
Cited 0

Vector quantization and signal compression

Vector quantization and signal compression
Randomized algorithms

Randomized algorithms
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
Clustering for edge-cost minimization (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Latent semantic indexing: a probabilistic analysis

Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation

Journal of the ACM (JACM)
Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Polynomial-time approximation schemes for geometric min-sum median clustering

Journal of the ACM (JACM)
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A constant-factor approximation algorithm for the k-median problem

Journal of Computer and System Sciences - STOC 1999
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Acceleration of K-Means and Related Clustering Algorithms

ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP

Journal of the ACM (JACM)
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Local Search Heuristics for k-Median and Facility Location Problems

SIAM Journal on Computing
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Clustering Large Graphs via the Singular Value Decomposition

Machine Learning
Optimal Time Bounds for Approximate Clustering

Machine Learning
A local search approximation algorithm for k-means clustering

Computational Geometry: Theory and Applications - Special issue on the 18th annual symposium on computational geometry—SoCG2002
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
How Fast Is the k-Means Method?

Algorithmica
How slow is the k-means method?

Proceedings of the twenty-second annual symposium on Computational geometry
The Effectiveness of Lloyd-Type Methods for the k-Means Problem

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Smooth sensitivity and sampling in private data analysis

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Secure two-party k-means clustering

Proceedings of the 14th ACM conference on Computer and communications security
Approximate clustering without the approximation

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Adaptive Sampling for k-Means Clustering

APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
Iterative optimization and simplification of hierarchical clusterings

Journal of Artificial Intelligence Research
The reverse greedy algorithm for the metric k-median problem

Information Processing Letters
Stability Yields a PTAS for k-Median and k-Means Clustering

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
k-means Requires Exponentially Many Iterations Even in the Plane

Discrete & Computational Geometry - Special Issue: 25th Annual Symposium on Computational Geometry; Guest Editor: John Hershberger
Smoothed Analysis of the k-Means Method

Journal of the ACM (JACM)
An experimental comparison of several clustering and initialization methods

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Quantization

IEEE Transactions on Information Theory
Least squares quantization in PCM

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.