A local search approximation algorithm for k-means clustering

Authors:
Tapas Kanungo;David M. Mount;Nathan S. Netanyahu;Christine D. Piatko;Ruth Silverman;Angela Y. Wu
Affiliations:
IBM Almaden Research Center, San Jose, CA;Department of Computer Science, University of Maryland, College Park, MD;Department of Mathematics and Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel and Center for Automation Research, University of Maryland, College Park, MD;The Johns Hopkins University Applied Physics Laboratory Laurel, MD;Center for Automation Research, University of Maryland, College Park, MD;Department of Computer Science, American University, Washington, DC
Venue:
Computational Geometry: Theory and Applications - Special issue on the 18th annual symposium on computational geometry—SoCG2002
Year:
2004

Citing 24
Cited 35

Using simulated annealing to design good codes

IEEE Transactions on Information Theory
Algorithms for clustering data

Algorithms for clustering data
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Geometric clusterings

Journal of Algorithms
Vector quantization and signal compression

Vector quantization and signal compression
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Faster construction of planar two-centers

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Analysis of a local search heuristic for facility location problems

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Exact and approximation algorithms for clustering

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Centroidal Voronoi Tessellations: Applications and Algorithms

SIAM Review
Local search heuristic for k-median and facility location problems

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Acceleration of K-Means and Related Clustering Algorithms

ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Optimal time bounds for approximate clustering

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Frequency-based views to pattern collections

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A continuous facility location problem and its application to a clustering problem

Proceedings of the 2008 ACM symposium on Applied computing
An approximation ratio for biclustering

Information Processing Letters
Constrained Clustering Via Concavity Cuts

CPAIOR '07 Proceedings of the 4th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
The Planar k-Means Problem is NP-Hard

WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
Ranking tournaments: Local search and a new algorithm

Journal of Experimental Algorithmics (JEA)
k-means requires exponentially many iterations even in the plane

Proceedings of the twenty-fifth annual symposium on Computational geometry
A case study of behavior-driven conjoint analysis on Yahoo!: front page today module

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A novel 3D mesh compression using mesh segmentation with multiple principal plane analysis

Pattern Recognition
Adaptive Sampling for k-Means Clustering

APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
RACK: RApid clustering using K-means algorithm

CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Frequency-based views to pattern collections

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Algorithms for K-means clustering problem with balancing constraint

CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Parallel approximation algorithms for facility-location problems

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
On the efficiency of swap-based clustering

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
ACO-based Projection Pursuit clustering algorithm

CAR'10 Proceedings of the 2nd international Asia conference on Informatics in control, automation and robotics - Volume 1
Document clustering using synthetic cluster prototypes

Data & Knowledge Engineering
Transform based spatio-temporal descriptors for human action recognition

Neurocomputing
A two-phase local search for the K clusters with fixed cardinality problem

MACMESE'10 Proceedings of the 12th WSEAS international conference on Mathematical and computational methods in science and engineering
Exploratory monitoring of large-scale networks using clustering algorithms

Proceedings of the First International Workshop on Data Mining for Service and Maintenance
Smoothed Analysis of the k-Means Method

Journal of the ACM (JACM)
Coresets for discrete integration and clustering

FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
Clustering for bioinformatics via matrix optimization

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Scalable k-means++

Proceedings of the VLDB Endowment
StreamKM++: A clustering algorithm for data streams

Journal of Experimental Algorithmics (JEA)
Data clustering using bacterial foraging optimization

Journal of Intelligent Information Systems
The planar k-means problem is NP-hard

Theoretical Computer Science
Intrinsic Images by Clustering

Computer Graphics Forum
Random swap EM algorithm for Gaussian mixture models

Pattern Recognition Letters
A novel 3D segmentation method of the lumen from intravascular ultrasound images

ICIAR'07 Proceedings of the 4th international conference on Image Analysis and Recognition
The effectiveness of lloyd-type methods for the k-means problem

Journal of the ACM (JACM)
Deterministic sublinear-time approximations for metric 1-median selection

Information Processing Letters
Clustering under approximation stability

Journal of the ACM (JACM)
Scalable K-Means by ranked retrieval

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In k-means clustering we are given a set of n data points in d-dimensional space Rd and an integer k, and the problem is to determine a set of k points in Rd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance.We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9 + ε)-approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9 - ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, this heuristic performs quite well in practice.