Correlation clustering with a fixed number of clusters

Authors:
Ioannis Giotis;Venkatesan Guruswami
Affiliations:
University of Washington, Scattle, WA;University of Washington, Scattle, WA
Venue:
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Year:
2006

Citing 14
Cited 12

Property testing and its connection to learning and approximation

Journal of the ACM (JACM)
Cluster Graph Modification Problems

WG '02 Revised Papers from the 28th International Workshop on Graph-Theoretic Concepts in Computer Science
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A Randomized Approximation Scheme for Metric MAX-CUT

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
A Sublinear Time Approximation Scheme for Clustering in Metric Spaces

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Correlation Clustering: maximizing agreements via semidefinite programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Correlation Clustering

Machine Learning
Maximizing Quadratic Programs: Extending Grothendieck's Inequality

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Optimal Inapproximability Results for Max-Cut and Other 2-Variable CSPs?

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Quadratic forms on graphs

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
O(√log n) approximation algorithms for min UnCut, min 2CNF deletion, and directed cut problems

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Aggregating inconsistent information: ranking and clustering

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
On Non-Approximability for Quadratic Programs

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Clustering with qualitative information

Journal of Computer and System Sciences - Special issue: Learning theory 2003

Seeking stable clusters in the blogosphere

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Theory research at Google

ACM SIGACT News
A Local-Search 2-Approximation for 2-Correlation-Clustering

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Efficient query routing by improved peer description in P2P networks

Proceedings of the 3rd international conference on Scalable information systems
A more effective linear kernelization for cluster editing

Theoretical Computer Science
An online blog reading system by topic clustering and personalized ranking

ACM Transactions on Internet Technology (TOIT)
Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
On finding graph clusterings with maximum modularity

WG'07 Proceedings of the 33rd international conference on Graph-theoretic concepts in computer science
Clustering with diversity

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Improved approximation algorithms for bipartite correlation clustering

ESA'11 Proceedings of the 19th European conference on Algorithms
Chromatic correlation clustering

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A more effective linear kernelization for Cluster Editing

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We continue the investigation of problems concerning correlation clustering or clustering with qualitative information, which is a clustering formulation that has been studied recently [5, 7, 8, 3]. The basic setup here is that we are given as input a complete graph on n nodes (which correspond to nodes to be clustered) whose edges are labeled + (for similar pairs of items) and - (for dissimilar pairs of items). Thus we have only as input qualitative information on similarity and no quantitative distance measure between items. The quality of a clustering is measured in terms of its number of agreements, which is simply the number of edges it correctly classifies, that is the sum of number of - edges whose endpoints it places in different clusters plus the number of + edges both of whose endpoints it places within the same cluster.In this paper, we study the problem of finding clusterings that maximize the number of agreements, and the complementary minimization version where we seek clusterings that minimize the number of disagreements. We focus on the situation when the number of clusters is stipulated to be a small constant k. Our main result is that for every k, there is a polynomial time approximation scheme for both maximizing agreements and minimizing disagreements. (The problems are NP-hard for every k ≥ 2.) The main technical work is for the minimization version, as the PTAS for maximizing agreements follows along the lines of the property tester for Max k-CUT from [13].In contrast, when the number of clusters is not specified, the problem of minimizing disagreements was shown to be APX-hard [7], even though the maximization version admits a PTAS.