The algebraic degree of geometric optimization problems
Discrete & Computational Geometry
SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Approximate clustering via core-sets
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
A new greedy approach for facility location problems
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Parallel Optimization: Theory, Algorithms and Applications
Parallel Optimization: Theory, Algorithms and Applications
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Approximation schemes for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Bounded Geometries, Fractals, and Low-Distortion Embeddings
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
On coresets for k-means and k-median clustering
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Optimal Time Bounds for Approximate Clustering
Machine Learning
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Quick k-Median, k-Center, and Facility Location for Sparse Graphs
SIAM Journal on Computing
On k-Median clustering in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
The Effectiveness of Lloyd-Type Methods for the k-Means Problem
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Clustering with Bregman Divergences
The Journal of Machine Learning Research
A PTAS for k-means clustering based on weak coresets
SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Clustering for metric and non-metric distance measures
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Mixed Bregman Clustering with Approximation Guarantees
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Coresets and approximate clustering for Bregman divergences
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Linear time algorithms for clustering problems in any dimensions
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Smoothed Analysis of the k-Means Method
Journal of the ACM (JACM)
Deterministic sublinear-time approximations for metric 1-median selection
Information Processing Letters
Hi-index | 0.00 |
We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n, our goal is to find a set C of size k such that the sum of errors D(P,C) = ∑p ∈ P minc ∈ C {D(p,c)} is minimized. The main result in this article can be stated as follows: There exists a (1+&epsis;)-approximation algorithm for the k-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+&epsis;) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time n2O(mklog(mk/&epsis;)), where m is a constant that depends only on &epsis; and D. Using this characterization, we obtain the first linear time (1+&epsis;)-approximation algorithms for the k-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean k-median problem and the Euclidean k-means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].