A local search approximation algorithm for k-means clustering

Authors:
Tapas Kanungo;David M. Mount;Nathan S. Netanyahu;Christine D. Piatko;Ruth Silverman;Angela Y. Wu
Affiliations:
IBM Almaden Research Center, San Jose, CA;University of Maryland, College Park, MD;Science, Bar-Ilan University, Ramat-Gan, Israel and Center for Automation Research, University of Maryland, College Park, MD;The Johns Hopkins University Applied Physics Laboratory, Laurel, MD;University of Maryland, College Park, MD;American University, Washington, DC
Venue:
Proceedings of the eighteenth annual symposium on Computational geometry
Year:
2002

Citing 22
Cited 25

Using simulated annealing to design good codes

IEEE Transactions on Information Theory
Algorithms for clustering data

Algorithms for clustering data
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Geometric clusterings

Journal of Algorithms
Vector quantization and signal compression

Vector quantization and signal compression
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of a local search heuristic for facility location problems

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Exact and approximation algorithms for clustering

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Centroidal Voronoi Tessellations: Applications and Algorithms

SIAM Review
Local search heuristic for k-median and facility location problems

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Acceleration of K-Means and Related Clustering Algorithms

ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science

Finding recurrent sources in sequences

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A New Conceptual Clustering Framework

Machine Learning
Variational shape approximation

ACM SIGGRAPH 2004 Papers
Smaller coresets for k-median and k-means clustering

SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
How fast is the k-means method?

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
A fast k-means implementation using coresets

Proceedings of the twenty-second annual symposium on Computational geometry
How slow is the k-means method?

Proceedings of the twenty-second annual symposium on Computational geometry
A complexity-aware video adaptation mechanism for live streaming systems

EURASIP Journal on Applied Signal Processing
On efficient deployment of sensors on planar grid

Computer Communications
Modeling Relations between Inputs and Dynamic Behavior for General Programs

Languages and Compilers for Parallel Computing
Rigorous Probabilistic Trust-Inference with Applications to Clustering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Improving the efficiency and efficacy of the K-means clustering algorithm through a new convergence condition

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Agnostic clustering

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
A comparative study on attention-based rate adaptation for scalable video coding

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Automatic configuration of spectral dimensionality reduction methods

Pattern Recognition Letters
Who uses web search for what: and how

Proceedings of the fourth ACM international conference on Web search and data mining
Graph-Based fast image segmentation

SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Interest region-based image retrieval system based on graph-cut segmentation and feature vectors

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Fast k-means algorithms with constant approximation

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Streaming k-means on well-clusterable data

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
A survey of approximation results for local search algorithms

Efficient Approximation and Online Algorithms
Fast k-clustering queries on embeddings of road networks

Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications
On Combining Sequence Alignment and Feature-Quantization for Sub-Image Searching

International Journal of Multimedia Data Engineering & Management
Music recommendations for groups of users

Proceedings of the 2013 ACM international workshop on Immersive media experiences

Quantified Score

Hi-index	0.00

Visualization

Abstract

In k-means clustering we are given a set of n data points in d-dimensional space Rd and an integer k, and the problem is to determine a set of k points in ÓC;d, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the extremely high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance.We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9+&egr;)-approximation algorithm. We show that the approximation factor is almost tight, by giving an example for which the algorithm achieves an approximation factor of (9-&egr;). To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, this heuristic performs quite well in practice.