A PTAS for k-means clustering based on weak coresets
SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Smooth sensitivity and sampling in private data analysis
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Secure two-party k-means clustering
Proceedings of the 14th ACM conference on Computer and communications security
Mixed Bregman Clustering with Approximation Guarantees
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Approximate clustering without the approximation
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The Planar k-Means Problem is NP-Hard
WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
NP-hardness of Euclidean sum-of-squares clustering
Machine Learning
On centroidal voronoi tessellation—energy smoothness and fast computation
ACM Transactions on Graphics (TOG)
Adaptive Sampling for k-Means Clustering
APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
Linear-time approximation schemes for clustering problems in any dimensions
Journal of the ACM (JACM)
Filtering relocations on a Delaunay triangulation
SGP '09 Proceedings of the Symposium on Geometry Processing
Are there local maxima in the infinite-sample likelihood of Gaussian mixture estimation?
COLT'07 Proceedings of the 20th annual conference on Learning theory
Clustering for metric and nonmetric distance measures
ACM Transactions on Algorithms (TALG)
Algorithms and theory of computation handbook
Clustering with or without the approximation
COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
Smoothed Analysis of the k-Means Method
Journal of the ACM (JACM)
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Center-based clustering under perturbation stability
Information Processing Letters
Streaming k-means on well-clusterable data
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Bregman clustering for separable instances
SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
Proceedings of the VLDB Endowment
Active clustering of biological sequences
The Journal of Machine Learning Research
The planar k-means problem is NP-hard
Theoretical Computer Science
Privacy preserving distributed DBSCAN clustering
Proceedings of the 2012 Joint EDBT/ICDT Workshops
The effectiveness of lloyd-type methods for the k-means problem
Journal of the ACM (JACM)
Fast approximations to structured sparse coding and applications to object classification
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Data stability in clustering: a closer look
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Clustering under approximation stability
Journal of the ACM (JACM)
Theoretical Computer Science
Hi-index | 0.00 |
We investigate variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.