Data stability in clustering: a closer look

Authors:
Lev Reyzin
Affiliations:
School of Computer Science, Georgia Institute of Technology, Atlanta, GA
Venue:
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Year:
2012

Citing 22
Cited 0

Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Greedy strikes back: improved facility location algorithms

Journal of Algorithms
Approximating min-sum k-clustering in metric spaces

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
A new greedy approach for facility location problems

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A constant-factor approximation algorithm for the k-median problem

Journal of Computer and System Sciences - STOC 1999
Data streams: algorithms and applications

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Local Search Heuristics for k-Median and Facility Location Problems

SIAM Journal on Computing
The Effectiveness of Lloyd-Type Methods for the k-Means Problem

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Approximate clustering without the approximation

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Linear-time approximation schemes for clustering problems in any dimensions

Journal of the ACM (JACM)
Clustering with or without the approximation

COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
Stability Yields a PTAS for k-Median and k-Means Clustering

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Partition into triangles on bounded degree graphs

SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science
Center-based clustering under perturbation stability

Information Processing Letters
Alternative measures of computational complexity with applications to agnostic learning

TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
Least squares quantization in PCM

IEEE Transactions on Information Theory
Clustering under perturbation resilience

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the model introduced by Bilu and Linial [12],, who study problems for which the optimal clustering does not change when distances are perturbed. They show that even when a problem is NP-hard, it is sometimes possible to obtain efficient algorithms for instances resilient to certain multiplicative perturbations, e.g. on the order of $O(\sqrt{n})$ for max-cut clustering. Awasthi et al. [6], consider center-based objectives, and Balcan and Liang [9], analyze the k-median and min-sum objectives, giving efficient algorithms for instances resilient to certain constant multiplicative perturbations. Here, we are motivated by the question of to what extent these assumptions can be relaxed while allowing for efficient algorithms. We show there is little room to improve these results by giving NP-hardness lower bounds for both the k-median and min-sum objectives. On the other hand, we show that multiplicative resilience parameters, even only on the order of Θ(1), can be so strong as to make the clustering problem trivial, and we exploit these assumptions to present a simple one-pass streaming algorithm for the k-median objective. We also consider a model of additive perturbations and give a correspondence between additive and multiplicative notions of stability. Our results provide a close examination of the consequences of assuming, even constant, stability in data.