Correlation Clustering: maximizing agreements via semidefinite programming

Authors:
Chaitanya Swamy
Affiliations:
Cornell University, Ithaca, NY
Venue:
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Year:
2004

Citing 4
Cited 28

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

Journal of the ACM (JACM)
A new way of using semidefinite programming with applications to linear equations mod p

Journal of Algorithms
Correlation Clustering

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Clustering with Qualitative Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science

Correlation Clustering

Machine Learning
Clustering Aggregation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A divide-and-merge methodology for clustering

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the use of linear programming for unsupervised text classification

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Supervised clustering with support vector machines

ICML '05 Proceedings of the 22nd international conference on Machine learning
Error bounds for correlation clustering

ICML '05 Proceedings of the 22nd international conference on Machine learning
Correlation clustering with a fixed number of clusters

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On the approximability of maximum and minimum edge clique partition problems

CATS '06 Proceedings of the 12th Computing: The Australasian Theroy Symposium - Volume 51
Correlation clustering in general weighted graphs

Theoretical Computer Science - Approximation and online algorithms
A divide-and-merge methodology for clustering

ACM Transactions on Database Systems (TODS)
Clustering aggregation

ACM Transactions on Knowledge Discovery from Data (TKDD)
Supervised clustering of streaming data for email batch detection

Proceedings of the 24th international conference on Machine learning
A rigorous analysis of population stratification with limited data

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
On the Approximation of Correlation Clustering and Consensus Clustering

Journal of Computer and System Sciences
A note on the inapproximability of correlation clustering

Information Processing Letters
Collaborative partitioning with maximum user satisfaction

Proceedings of the 17th ACM conference on Information and knowledge management
Steps toward managing lineage metadata in grid clusters

TAPP'09 First workshop on on Theory and practice of provenance
Framework for evaluating clustering algorithms in duplicate detection

Proceedings of the VLDB Endowment
Correlation clustering with noisy input

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
A polynomial time approximation scheme for k-consensus clustering

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Sentiment community detection in social networks

Proceedings of the 2011 iConference
Clustering causal relationships in genes expression data

WIRN'05 Proceedings of the 16th Italian conference on Neural Nets
Correlation clustering and consensus clustering

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
A randomized PTAS for the minimum Consensus Clustering with a fixed number of clusters

Theoretical Computer Science
On the complexity of Newman's community finding approach for biological and social networks

Journal of Computer and System Sciences
On the approximability of maximum and minimum edge clique partition problems

CATS '06 Proceedings of the Twelfth Computing: The Australasian Theory Symposium - Volume 51
Correlation clustering with stochastic labellings

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the Correlation Clustering problem introduced in [2]. Given a graph G = (V,E) where each edge is labeled either "+" (similar) or "-" (different), we want to cluster the nodes so that the + edges lie within the clusters and the -- edges lie between clusters. Specifically, we want to maximize agreements --- the number of + edges within clusters and -- edges between clusters. This problem is NP-Hard [2]. We give a 0.7666-approximation algorithm for maximizing agreements on any graph even when the edges have non-negative weights (along with labels) and we want to maximize the weight of agreements. These were posed as open problems in [2]. Previously the only results known were a trivial 0.5-approximation for arbitrary edge weighted graphs, and a PTAS with unit edge weights when |E| = Ω(|V|2). Somewhat surprisingly, our algorithm always produces a clustering with at most 6 clusters. As a corollary we get a 0.7666-approximation algorithm for the k-clustering variant of the problem where we may create at most k clusters. A major component of this algorithm is a simple, easy-to-analyze algorithm that by itself achieves an approximation ratio of 0.75, opening at most 4 clusters.