Approximate clustering without the approximation

Authors:
Maria-Florina Balcan;Avrim Blum;Anupam Gupta
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University
Venue:
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Year:
2009

Citing 22
Cited 16

Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A constant-factor approximation algorithm for the k-median problem (extended abstract)

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Sublinear time algorithms for metric space problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Greedy strikes back: improved facility location algorithms

Journal of Algorithms
Clustering for edge-cost minimization (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Sublinear time approximate clustering

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation

Journal of the ACM (JACM)
Approximating min-sum k-clustering in metric spaces

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
A new greedy approach for facility location problems

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Learning Mixtures of Gaussians

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Local Search Heuristics for k-Median and Facility Location Problems

SIAM Journal on Computing
A spectral algorithm for learning mixture models

Journal of Computer and System Sciences - Special issue on FOCS 2002
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
The uniqueness of a good optimum for K-means

ICML '06 Proceedings of the 23rd international conference on Machine learning
Worst-case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-means Method

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
The Effectiveness of Lloyd-Type Methods for the k-Means Problem

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

Machine Learning
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
The spectral method for general mixture models

COLT'05 Proceedings of the 18th annual conference on Learning Theory
On spectral learning of mixtures of distributions

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Agnostic clustering

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Clustering with or without the approximation

COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
On nash-equilibria of approximation-stable games

SAGT'10 Proceedings of the Third international conference on Algorithmic game theory
On the complexity of the metric TSP under stability considerations

SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science
Min-sum clustering of protein sequences with limited distance information

SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Center-based clustering under perturbation stability

Information Processing Letters
Streaming k-means on well-clusterable data

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Active clustering of biological sequences

The Journal of Machine Learning Research
Approximation algorithms for semi-random partitioning problems

STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
The effectiveness of lloyd-type methods for the k-means problem

Journal of the ACM (JACM)
A framework for evaluating the smoothness of data-mining results

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Data stability in clustering: a closer look

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Clustering under approximation stability

Journal of the ACM (JACM)
Improved Cheeger's inequality: analysis of spectral partitioning algorithms through higher order spectral gap

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Decompositions of triangle-dense graphs

Proceedings of the 5th conference on Innovations in theoretical computer science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also yield more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown correct "target" clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close pointwise to the truth. In this paper, we show that if we make this implicit assumption explicit---that is, if we assume that any c-approximation to the given clustering objective φ is ε-close to the target---then we can produce clusterings that are O(ε)-close to the target, even for values c for which obtaining a c-approximation is NP-hard. In particular, for k-median and k-means objectives, we show that we can achieve this guarantee for any constant c 1, and for the min-sum objective we can do this for any constant c 2. Our results also highlight a surprising conceptual difference between assuming that the optimal solution to, say, the k-median objective is ε-close to the target, and assuming that any approximately optimal solution is ε-close to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(ε)-close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.