The Effectiveness of Lloyd-Type Methods for the k-Means Problem

Authors:
Rafail Ostrovsky;Yuval Rabani;Leonard J. Schulman;Chaitanya Swamy
Affiliations:
UCLA, USA;Israel Institute of Technology, Israel;Caltech, USA;Caltech, USA
Venue:
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Year:
2006

Citing 0
Cited 35

A PTAS for k-means clustering based on weak coresets

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Smooth sensitivity and sampling in private data analysis

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Secure two-party k-means clustering

Proceedings of the 14th ACM conference on Computer and communications security
Mixed Bregman Clustering with Approximation Guarantees

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Approximate clustering without the approximation

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The Planar k-Means Problem is NP-Hard

WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
NP-hardness of Euclidean sum-of-squares clustering

Machine Learning
On centroidal voronoi tessellation—energy smoothness and fast computation

ACM Transactions on Graphics (TOG)
Adaptive Sampling for k-Means Clustering

APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
Linear-time approximation schemes for clustering problems in any dimensions

Journal of the ACM (JACM)
Filtering relocations on a Delaunay triangulation

SGP '09 Proceedings of the Symposium on Geometry Processing
Are there local maxima in the infinite-sample likelihood of Gaussian mixture estimation?

COLT'07 Proceedings of the 20th annual conference on Learning theory
Clustering for metric and nonmetric distance measures

ACM Transactions on Algorithms (TALG)
Computational systems biology

Algorithms and theory of computation handbook
Clustering with or without the approximation

COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
Smoothed Analysis of the k-Means Method

Journal of the ACM (JACM)
α-clusterable sets

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Center-based clustering under perturbation stability

Information Processing Letters
Streaming k-means on well-clusterable data

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Bregman clustering for separable instances

SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
Scalable k-means++

Proceedings of the VLDB Endowment
Active clustering of biological sequences

The Journal of Machine Learning Research
The planar k-means problem is NP-hard

Theoretical Computer Science
Privacy preserving distributed DBSCAN clustering

Proceedings of the 2012 Joint EDBT/ICDT Workshops
Short communication: An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application

Applied Soft Computing
The effectiveness of lloyd-type methods for the k-means problem

Journal of the ACM (JACM)
Fast approximations to structured sparse coding and applications to object classification

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Data stability in clustering: a closer look

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Data quality evaluation and improvement for prognostic modeling using visual assessment based data partitioning method

Computers in Industry
Clustering under approximation stability

Journal of the ACM (JACM)
STHist-C: a highly accurate cluster-based histogram for two and three dimensional geographic data points

Geoinformatica
Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection

Pattern Recognition
A bad instance for k-means++

Theoretical Computer Science
Special Section on Computer Graphics in Brazil: Invariances of single curved manifolds applied to mesh segmentation

Computers and Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.