Vector quantization and signal compression
Vector quantization and signal compression
Randomized algorithms
Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
An empirical comparison of four initialization methods for the K-Means algorithm
Pattern Recognition Letters
Clustering for edge-cost minimization (extended abstract)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Latent semantic indexing: a probabilistic analysis
Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Polynomial-time approximation schemes for geometric min-sum median clustering
Journal of the ACM (JACM)
Approximate clustering via core-sets
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
A constant-factor approximation algorithm for the k-median problem
Journal of Computer and System Sciences - STOC 1999
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Acceleration of K-Means and Related Clustering Algorithms
ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
Approximation schemes for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP
Journal of the ACM (JACM)
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Local Search Heuristics for k-Median and Facility Location Problems
SIAM Journal on Computing
On coresets for k-means and k-median clustering
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Clustering Large Graphs via the Singular Value Decomposition
Machine Learning
Optimal Time Bounds for Approximate Clustering
Machine Learning
A local search approximation algorithm for k-means clustering
Computational Geometry: Theory and Applications - Special issue on the 18th annual symposium on computational geometrySoCG2002
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
How Fast Is the k-Means Method?
Algorithmica
How slow is the k-means method?
Proceedings of the twenty-second annual symposium on Computational geometry
The Effectiveness of Lloyd-Type Methods for the k-Means Problem
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Smooth sensitivity and sampling in private data analysis
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Secure two-party k-means clustering
Proceedings of the 14th ACM conference on Computer and communications security
Approximate clustering without the approximation
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Adaptive Sampling for k-Means Clustering
APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
Iterative optimization and simplification of hierarchical clusterings
Journal of Artificial Intelligence Research
The reverse greedy algorithm for the metric k-median problem
Information Processing Letters
Stability Yields a PTAS for k-Median and k-Means Clustering
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
k-means Requires Exponentially Many Iterations Even in the Plane
Discrete & Computational Geometry - Special Issue: 25th Annual Symposium on Computational Geometry; Guest Editor: John Hershberger
Smoothed Analysis of the k-Means Method
Journal of the ACM (JACM)
An experimental comparison of several clustering and initialization methods
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
IEEE Transactions on Information Theory
Least squares quantization in PCM
IEEE Transactions on Information Theory
Hi-index | 0.00 |
We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.