Approximation schemes for Euclidean k-medians and related problems
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A threshold of ln n for approximating set cover
Journal of the ACM (JACM)
A constant-factor approximation algorithm for the k-median problem (extended abstract)
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Sublinear time algorithms for metric space problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Greedy strikes back: improved facility location algorithms
Journal of Algorithms
P-Complete Approximation Problems
Journal of the ACM (JACM)
Clustering for edge-cost minimization (extended abstract)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Approximating min-sum k-clustering in metric spaces
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Learning mixtures of arbitrary gaussians
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
A new greedy approach for facility location problems
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximation schemes for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Learning Mixtures of Gaussians
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Local Search Heuristics for k-Median and Facility Location Problems
SIAM Journal on Computing
Expander flows, geometric embeddings and graph partitioning
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Machine Learning
A local search approximation algorithm for k-means clustering
Computational Geometry: Theory and Applications - Special issue on the 18th annual symposium on computational geometrySoCG2002
A spectral algorithm for learning mixture models
Journal of Computer and System Sciences - Special issue on FOCS 2002
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Comparing clusterings: an axiomatic view
ICML '05 Proceedings of the 22nd international conference on Machine learning
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The uniqueness of a good optimum for K-means
ICML '06 Proceedings of the 23rd international conference on Machine learning
The Effectiveness of Lloyd-Type Methods for the k-Means Problem
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A discriminative framework for clustering via similarity functions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Introduction to Information Retrieval
Introduction to Information Retrieval
Approximate clustering without the approximation
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The Planar k-Means Problem is NP-Hard
WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
Fast approximate spectral clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Small space representations for metric min-sum k-clustering and their applications
STACS'07 Proceedings of the 24th annual conference on Theoretical aspects of computer science
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Clustering with or without the approximation
COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
Settling the Polynomial Learnability of Mixtures of Gaussians
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Polynomial Learning of Distribution Families
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Clustering with Spectral Norm and the k-Means Algorithm
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Stability Yields a PTAS for k-Median and k-Means Clustering
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Center-based clustering under perturbation stability
Information Processing Letters
The spectral method for general mixture models
COLT'05 Proceedings of the 18th annual conference on Learning Theory
On spectral learning of mixtures of distributions
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Active clustering of biological sequences
The Journal of Machine Learning Research
Least squares quantization in PCM
IEEE Transactions on Information Theory
Clustering under perturbation resilience
ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Hi-index | 0.00 |
A common approach to clustering data is to view data objects as points in a metric space, and then to optimize a natural distance-based objective such as the k-median, k-means, or min-sum score. For applications such as clustering proteins by function or clustering images by subject, the implicit hope in taking this approach is that the optimal solution for the chosen objective will closely match the desired “target” clustering (e.g., a correct clustering of proteins by function or of images by who is in them). However, most distance-based objectives, including those mentioned here, are NP-hard to optimize. So, this assumption by itself is not sufficient, assuming P ≠ NP, to achieve clusterings of low-error via polynomial time algorithms. In this article, we show that we can bypass this barrier if we slightly extend this assumption to ask that for some small constant c, not only the optimal solution, but also all c-approximations to the optimal solution, differ from the target on at most some ε fraction of points—we call this (c,ε)-approximation-stability. We show that under this condition, it is possible to efficiently obtain low-error clusterings even if the property holds only for values c for which the objective is known to be NP-hard to approximate. Specifically, for any constant c 1, (c,ε)-approximation-stability of k-median or k-means objectives can be used to efficiently produce a clustering of error O(ε) with respect to the target clustering, as can stability of the min-sum objective if the target clusters are sufficiently large. Thus, we can perform nearly as well in terms of agreement with the target clustering as if we could approximate these objectives to this NP-hard value.