Center-based clustering under perturbation stability
Information Processing Letters
The effectiveness of lloyd-type methods for the k-means problem
Journal of the ACM (JACM)
Data stability in clustering: a closer look
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Clustering under approximation stability
Journal of the ACM (JACM)
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Approximating k-median via pseudo-approximation
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Hi-index | 0.00 |
We consider $k$-median clustering in finite metric spaces and $k$-means clustering in Euclidean spaces, in the setting where $k$ is part of the input (not a constant). For the $k$-means problem, Ostrovsky et al. show that if the optimal $(k-1)$-means clustering of the input is more expensive than the optimal $k$-means clustering by a factor of $1/\epsilon^2$, then one can achieve a $(1+f(\epsilon))$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$ by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the $(k-1)$-means optimal is more expensive than the $k$-means optimal by a factor $1+\alpha$ for {\em some} constant $\alpha0$, we can obtain a PTAS. In particular, under this assumption, for any $\eps0$ we achieve a $(1+\eps)$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$, and exponential in $1/\eps$ and $1/\alpha$. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the $k$-median problem in finite metrics under the analogous assumption as well. For $k$-means, we in addition give a randomized algorithm with improved running time of $n^{O(1)}(k \log n)^{\poly(1/\epsilon,1/\alpha)}$. Our technique also obtains a PTAS under the assumption of Balcan et al. that all $(1+\alpha)$ approximations are $\delta$-close to a desired target clustering, in the case that all target clusters have size greater than $\delta n$ and $\alpha0$ is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for $k$-means in Euclidean spaces we reduce the distance of the clustering found to the target from $O(\delta)$ to $\delta$ when all target clusters are large, and for $k$-median we improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $\delta$-close from $O(\delta n)$ to $\delta n$. Our results are based on a new notion of clustering stability.