Fast k-means algorithms with constant approximation

Authors:
Mingjun Song;Sanguthevar Rajasekaran
Affiliations:
Computer Science and Engineering, University of Connecticut, Storrs, CT;Computer Science and Engineering, University of Connecticut, Storrs, CT
Venue:
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Year:
2005

Citing 6
Cited 0

Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
A new greedy approach for facility location problems

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
A local search approximation algorithm for k-means clustering

Proceedings of the eighteenth annual symposium on Computational geometry
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A k-Median Algorithm with Running Time Independent of Data Size

Machine Learning
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study the k-means clustering problem. It is well-known that the general version of this problem is $\mathcal{NP}$-hard. Numerous approximation algorithms have been proposed for this problem. In this paper, we proposed three constant approximation algorithms for k-means clustering. The first algorithm runs in time $O(({{k}\over{\epsilon}})^{k}nd)$, where k is the number of clusters, n is the size of input points, d is dimension of attributes. The second algorithm runs in time O(k3n2log n). This is the first algorithm for k-means clustering that runs in time polynomial in n, k and d. The run time of the third algorithm (O(k5 log3kd)) is independent of n. Though an algorithm whose run time is independent of n is known for the k-median problem, ours is the first such algorithm for the k-means problem.