A discriminative framework for clustering via similarity functions

Authors:
Maria-Florina Balcan;Avrim Blum;Santosh Vempala
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Georgia Institute of Technology, Atlanta, GA, USA
Venue:
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Year:
2008

Citing 29
Cited 17

A theory of the learnable

Communications of the ACM
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
A Spectral Technique for Coloring Random 3-Colorable Graphs

SIAM Journal on Computing
A constant-factor approximation algorithm for the k-median problem (extended abstract)

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation

Journal of the ACM (JACM)
Learning mixtures of arbitrary gaussians

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
AI Game Programming Wisdom

AI Game Programming Wisdom
Diffusion Kernels on Graphs and Other Discrete Input Spaces

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Learning Mixtures of Gaussians

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Spectral Partitioning of Random Graphs

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Clustering with Qualitative Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Random sampling and approximation of MAX-CSPs

Journal of Computer and System Sciences - STOC 2002
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Correlation Clustering: maximizing agreements via semidefinite programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
On clusterings: Good, bad and spectral

Journal of the ACM (JACM)
Correlation Clustering

Machine Learning
A spectral algorithm for learning mixture models

Journal of Computer and System Sciences - Special issue on FOCS 2002
Aggregating inconsistent information: ranking and clustering

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
On Learning Mixtures of Heavy-Tailed Distributions

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
On a theory of learning with similarity functions

ICML '06 Proceedings of the 23rd international conference on Machine learning
Kernels as features: On kernels, margins, and low-dimensional mappings

Machine Learning
A divide-and-merge methodology for clustering

ACM Transactions on Database Systems (TODS)
A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

Machine Learning
Spectral clustering by recursive partitioning

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
How good is a kernel when used as a similarity measure?

COLT'07 Proceedings of the 20th annual conference on Learning theory
The spectral method for general mixture models

COLT'05 Proceedings of the 18th annual conference on Learning Theory
On spectral learning of mixtures of distributions

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Structural risk minimization over data-dependent hierarchies

IEEE Transactions on Information Theory

A theory of learning with similarity functions

Machine Learning
Clustering with Interactive Feedback

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Approximate clustering without the approximation

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Combinatorial Framework for Similarity Search

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Learning unknown graphs

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Agnostic clustering

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Fast euclidean minimum spanning tree: algorithm, analysis, and applications

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Nearest neighbor search: algorithmic perspective

SIGSPATIAL Special
Clustering with or without the approximation

COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
Predicting the labels of an unknown graph via adaptive exploration

Theoretical Computer Science
Center-based clustering under perturbation stability

Information Processing Letters
Center-Wise intra-inter silhouettes

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Data stability in clustering: a closer look

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Data quality evaluation and improvement for prognostic modeling using visual assessment based data partitioning method

Computers in Industry
Clustering under approximation stability

Journal of the ACM (JACM)
A binary-classification-based metric between time-series distributions and its use in statistical and learning problems

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Problems of clustering data from pairwise similarity information are ubiquitous in Computer Science. Theoretical treatments typically view the similarity information as ground-truth and then design algorithms to (approximately) optimize various graph-based objective functions. However, in most applications, this similarity information is merely based on some heuristic; the ground truth is really the unknown correct clustering of the data points and the real goal is to achieve low error on the data. In this work, we develop a theoretical approach to clustering from this perspective. In particular, motivated by recent work in learning theory that asks "what natural properties of a similarity (or kernel) function are sufficient to be able to learn well?" we ask "what natural properties of a similarity function are sufficient to be able to cluster well?" To study this question we develop a theoretical framework that can be viewed as an analog of the PAC learning model for clustering, where the object of study, rather than being a concept class, is a class of (concept, similarity function) pairs, or equivalently, a property the similarity function should satisfy with respect to the ground truth clustering. We then analyze both algorithmic and information theoretic issues in our model. While quite strong properties are needed if the goal is to produce a single approximately-correct clustering, we find that a number of reasonable properties are sufficient under two natural relaxations: (a) list clustering: analogous to the notion of list-decoding, the algorithm can produce a small list of clusterings (which a user can select from) and (b) hierarchical clustering: the algorithm's goal is to produce a hierarchy such that desired clustering is some pruning of this tree (which a user could navigate). We develop a notion of the clustering complexity of a given property (analogous to notions of capacity in learning theory), that characterizes its information-theoretic usefulness for clustering. We analyze this quantity for several natural game-theoretic and learning-theoretic properties, as well as design new efficient algorithms that are able to take advantage of them. Our algorithms for hierarchical clustering combine recent learning-theoretic approaches with linkage-style methods. We also show how our algorithms can be extended to the inductive case, i.e., by using just a constant-sized sample, as in property testing. The analysis here uses regularity-type results of [FK] and [AFKK].