Correlation Clustering: maximizing agreements via semidefinite programming

  • Authors:
  • Chaitanya Swamy

  • Affiliations:
  • Cornell University, Ithaca, NY

  • Venue:
  • SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the Correlation Clustering problem introduced in [2]. Given a graph G = (V,E) where each edge is labeled either "+" (similar) or "-" (different), we want to cluster the nodes so that the + edges lie within the clusters and the -- edges lie between clusters. Specifically, we want to maximize agreements --- the number of + edges within clusters and -- edges between clusters. This problem is NP-Hard [2]. We give a 0.7666-approximation algorithm for maximizing agreements on any graph even when the edges have non-negative weights (along with labels) and we want to maximize the weight of agreements. These were posed as open problems in [2]. Previously the only results known were a trivial 0.5-approximation for arbitrary edge weighted graphs, and a PTAS with unit edge weights when |E| = Ω(|V|2). Somewhat surprisingly, our algorithm always produces a clustering with at most 6 clusters. As a corollary we get a 0.7666-approximation algorithm for the k-clustering variant of the problem where we may create at most k clusters. A major component of this algorithm is a simple, easy-to-analyze algorithm that by itself achieves an approximation ratio of 0.75, opening at most 4 clusters.