Clustering under approximation stability

Authors:
Maria-Florina Balcan;Avrim Blum;Anupam Gupta
Affiliations:
Georgia Institute of Technology, Atlanta, GA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Journal of the ACM (JACM)
Year:
2013

Citing 48
Cited 0

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

Journal of the ACM (JACM)
Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A threshold of ln n for approximating set cover

Journal of the ACM (JACM)
A constant-factor approximation algorithm for the k-median problem (extended abstract)

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Sublinear time algorithms for metric space problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Greedy strikes back: improved facility location algorithms

Journal of Algorithms
P-Complete Approximation Problems

Journal of the ACM (JACM)
Clustering for edge-cost minimization (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation

Journal of the ACM (JACM)
Approximating min-sum k-clustering in metric spaces

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Learning mixtures of arbitrary gaussians

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
A new greedy approach for facility location problems

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Learning Mixtures of Gaussians

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Testing of clustering

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Local Search Heuristics for k-Median and Facility Location Problems

SIAM Journal on Computing
Expander flows, geometric embeddings and graph partitioning

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Correlation Clustering

Machine Learning
A local search approximation algorithm for k-means clustering

Computational Geometry: Theory and Applications - Special issue on the 18th annual symposium on computational geometry—SoCG2002
A spectral algorithm for learning mixture models

Journal of Computer and System Sciences - Special issue on FOCS 2002
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
k-means projective clustering

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Comparing clusterings: an axiomatic view

ICML '05 Proceedings of the 22nd international conference on Machine learning
Programmable clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The uniqueness of a good optimum for K-means

ICML '06 Proceedings of the 23rd international conference on Machine learning
The Effectiveness of Lloyd-Type Methods for the k-Means Problem

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Introduction to Information Retrieval

Introduction to Information Retrieval
Approximate clustering without the approximation

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The Planar k-Means Problem is NP-Hard

WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
Fast approximate spectral clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Small space representations for metric min-sum k-clustering and their applications

STACS'07 Proceedings of the 24th annual conference on Theoretical aspects of computer science
Agnostic clustering

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Clustering with or without the approximation

COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
Settling the Polynomial Learnability of Mixtures of Gaussians

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Polynomial Learning of Distribution Families

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Clustering with Spectral Norm and the k-Means Algorithm

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Stability Yields a PTAS for k-Median and k-Means Clustering

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Center-based clustering under perturbation stability

Information Processing Letters
The spectral method for general mixture models

COLT'05 Proceedings of the 18th annual conference on Learning Theory
On spectral learning of mixtures of distributions

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Local equivalences of distances between clusterings--a geometric perspective

Machine Learning
Active clustering of biological sequences

The Journal of Machine Learning Research
Least squares quantization in PCM

IEEE Transactions on Information Theory
Clustering under perturbation resilience

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common approach to clustering data is to view data objects as points in a metric space, and then to optimize a natural distance-based objective such as the k-median, k-means, or min-sum score. For applications such as clustering proteins by function or clustering images by subject, the implicit hope in taking this approach is that the optimal solution for the chosen objective will closely match the desired “target” clustering (e.g., a correct clustering of proteins by function or of images by who is in them). However, most distance-based objectives, including those mentioned here, are NP-hard to optimize. So, this assumption by itself is not sufficient, assuming P ≠ NP, to achieve clusterings of low-error via polynomial time algorithms. In this article, we show that we can bypass this barrier if we slightly extend this assumption to ask that for some small constant c, not only the optimal solution, but also all c-approximations to the optimal solution, differ from the target on at most some ε fraction of points—we call this (c,ε)-approximation-stability. We show that under this condition, it is possible to efficiently obtain low-error clusterings even if the property holds only for values c for which the objective is known to be NP-hard to approximate. Specifically, for any constant c 1, (c,ε)-approximation-stability of k-median or k-means objectives can be used to efficiently produce a clustering of error O(ε) with respect to the target clustering, as can stability of the min-sum objective if the target clusters are sufficiently large. Thus, we can perform nearly as well in terms of agreement with the target clustering as if we could approximate these objectives to this NP-hard value.