Using simulated annealing to design good codes
IEEE Transactions on Information Theory
Algorithms for clustering data
Algorithms for clustering data
Self-organization and associative memory: 3rd edition
Self-organization and associative memory: 3rd edition
Journal of Algorithms
Vector quantization and signal compression
Vector quantization and signal compression
SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Approximation schemes for Euclidean k-medians and related problems
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Faster construction of planar two-centers
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Analysis of a local search heuristic for facility location problems
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Exact and approximation algorithms for clustering
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
ACM Computing Surveys (CSUR)
Local search heuristic for k-median and facility location problems
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Acceleration of K-Means and Related Clustering Algorithms
ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Optimal time bounds for approximate clustering
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Frequency-based views to pattern collections
Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A continuous facility location problem and its application to a clustering problem
Proceedings of the 2008 ACM symposium on Applied computing
An approximation ratio for biclustering
Information Processing Letters
Constrained Clustering Via Concavity Cuts
CPAIOR '07 Proceedings of the 4th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
The Planar k-Means Problem is NP-Hard
WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
Ranking tournaments: Local search and a new algorithm
Journal of Experimental Algorithmics (JEA)
k-means requires exponentially many iterations even in the plane
Proceedings of the twenty-fifth annual symposium on Computational geometry
A case study of behavior-driven conjoint analysis on Yahoo!: front page today module
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Sampling for k-Means Clustering
APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
RACK: RApid clustering using K-means algorithm
CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Frequency-based views to pattern collections
Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Algorithms for K-means clustering problem with balancing constraint
CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Parallel approximation algorithms for facility-location problems
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
On the efficiency of swap-based clustering
ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
ACO-based Projection Pursuit clustering algorithm
CAR'10 Proceedings of the 2nd international Asia conference on Informatics in control, automation and robotics - Volume 1
Document clustering using synthetic cluster prototypes
Data & Knowledge Engineering
A two-phase local search for the K clusters with fixed cardinality problem
MACMESE'10 Proceedings of the 12th WSEAS international conference on Mathematical and computational methods in science and engineering
Exploratory monitoring of large-scale networks using clustering algorithms
Proceedings of the First International Workshop on Data Mining for Service and Maintenance
Smoothed Analysis of the k-Means Method
Journal of the ACM (JACM)
Coresets for discrete integration and clustering
FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
Clustering for bioinformatics via matrix optimization
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Proceedings of the VLDB Endowment
StreamKM++: A clustering algorithm for data streams
Journal of Experimental Algorithmics (JEA)
Data clustering using bacterial foraging optimization
Journal of Intelligent Information Systems
The planar k-means problem is NP-hard
Theoretical Computer Science
Intrinsic Images by Clustering
Computer Graphics Forum
Random swap EM algorithm for Gaussian mixture models
Pattern Recognition Letters
A novel 3D segmentation method of the lumen from intravascular ultrasound images
ICIAR'07 Proceedings of the 4th international conference on Image Analysis and Recognition
The effectiveness of lloyd-type methods for the k-means problem
Journal of the ACM (JACM)
Deterministic sublinear-time approximations for metric 1-median selection
Information Processing Letters
Clustering under approximation stability
Journal of the ACM (JACM)
Scalable K-Means by ranked retrieval
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
In k-means clustering we are given a set of n data points in d-dimensional space Rd and an integer k, and the problem is to determine a set of k points in Rd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance.We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9 + ε)-approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9 - ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, this heuristic performs quite well in practice.