Adaptive Sampling for k-Means Clustering
APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
Clustering for metric and nonmetric distance measures
ACM Transactions on Algorithms (TALG)
A near-linear algorithm for projective clustering integer points
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Streaming k-means on well-clusterable data
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Bregman clustering for separable instances
SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
StreamKM++: A clustering algorithm for data streams
Journal of Experimental Algorithmics (JEA)
Algorithmic superactivation of asymptotic quantum capacity of zero-capacity quantum channels
Information Sciences: an International Journal
Deterministic sublinear-time approximations for metric 1-median selection
Information Processing Letters
Hi-index | 0.01 |
We present new approximation algorithms for the $k$-median and $k$-means clustering problems. To this end, we obtain small coresets for $k$-median and $k$-means clustering in general metric spaces and in Euclidean spaces. In $\mathbb{R}^d$, these coresets are of size with polynomial dependency on the dimension $d$. This leads to $(1+\varepsilon)$-approximation algorithms to the optimal $k$-median and $k$-means clustering in $\mathbb{R}^d$, with running time $O(ndk+2^{(k/\varepsilon)^{O(1)}}d^2\log^{k+2}n)$, where $n$ is the number of points. This improves over previous results. We use those coresets to maintain a $(1+\varepsilon)$-approximate $k$-median and $k$-means clustering of a stream of points in $\mathbb{R}^d$, using $O(d^2k^2\varepsilon^{-2}\log^8n)$ space. These are the first streaming algorithms, for those problems, that have space complexity with polynomial dependency on the dimension.