Correlation clustering with noisy input

Authors:
Claire Mathieu;Warren Schudy
Affiliations:
Brown University;Brown University
Venue:
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Year:
2010

Citing 16
Cited 4

Expected complexity of graph partitioning problems

Discrete Applied Mathematics - Special issue: Combinatorial Optimization 1992 (CO92)
Finding a large hidden clique in a random graph

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Finding and certifying a large hidden clique in a semirandom graph

Random Structures & Algorithms
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Spectral Partitioning of Random Graphs

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Correlation Clustering: maximizing agreements via semidefinite programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Correlation Clustering

Machine Learning
Aggregating inconsistent information: ranking and clustering

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Error bounds for correlation clustering

ICML '05 Proceedings of the 22nd international conference on Machine learning
Clustering with qualitative information

Journal of Computer and System Sciences - Special issue: Learning theory 2003
Correlation clustering in general weighted graphs

Theoretical Computer Science - Approximation and online algorithms
Deterministic pivoting algorithms for constrained ranking and clustering problems

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Improved algorithms for the random cluster graph model

Random Structures & Algorithms
Noisy sorting without resampling

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Eigenvalues and graph bisection: An average-case analysis

SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
Bounding and comparing methods for correlation clustering beyond ILP

ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing

Clustering with local restrictions

ICALP'11 Proceedings of the 38th international colloquim conference on Automata, languages and programming - Volume Part I
Concentration and moment inequalities for polynomials of independent random variables

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Clustering with local restrictions

Information and Computation
Sorting noisy data with partial information

Proceedings of the 4th conference on Innovations in Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Correlation clustering is a type of clustering that uses a basic form of input data: For every pair of data items, the input specifies whether they are similar (belonging to the same cluster) or dissimilar (belonging to different clusters). This information may be inconsistent, and the goal is to find a clustering (partition of the vertices) that disagrees with as few pieces of information as possible. Correlation clustering is APX-hard for worst-case inputs. We study the following semi-random noisy model to generate the input: start from an arbitrary partition of the vertices into clusters. Then, for each pair of vertices, the similarity information is corrupted (noisy) independently with probability p. Finally, an adversary generates the input by choosing similarity/dissimilarity information arbitrarily for each corrupted pair of vertices. In this model, our algorithm produces a clustering with cost at most 1 + O(n-1/6) times the cost of the optimal clustering, as long as p ≤ 1/2 - n-1/3. Moreover, if all clusters have size at least1 c1√n then we can exactly reconstruct the planted clustering. If the noise p is small, that is, p ≤ n-δ/60, then we can exactly reconstruct all clusters of the planted clustering that have size at least 3150/δ, and provide a certificate (witness) proving that those clusters are in any optimal clustering. Among other techniques, we use the natural semi-definite programming relaxation followed by an interesting rounding phase. The analysis uses SDP duality and spectral properties of random matrices.