Correlation clustering in general weighted graphs

  • Authors:
  • Erik D. Demaine;Dotan Emanuel;Amos Fiat;Nicole Immorlica

  • Affiliations:
  • Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Department of Computer Science, School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel;Department of Computer Science, School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA

  • Venue:
  • Theoretical Computer Science - Approximation and online algorithms
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the following general correlation-clustering problem [N. Bansal, A. Blum, S. Chawla, Correlation clustering, in: Proc. 43rd Annu. IEEE Symp. on Foundations of Computer Science, Vancouver, Canada, November 2002, pp. 238-250]: given a graph with real nonnegative edge weights and a 〈+〉/〈-〉 edge labelling, partition the vertices into clusters to minimize the total weight of cut 〈+〉 edges and uncut 〈-〉 edges. Thus, 〈+〉 edges with large weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster while 〈-〉 edges with large weights encourage the endpoints to belong to different clusters. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be best possible by the problem definition.Correlation clustering was introduced by Bansal et al. [Correlation clustering, in: Proc. 43rd Annu. IEEE Syrup. on Foundations of Computer Science, Vancouver, Canada, November 2002, pp. 238-250], motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has the same weight. We give an O(log n)-approximation algorithm for the general case based on a linear-programming rounding and the "region-growing" technique. We also prove that this linear program has a gap of Ω(log n), and therefore our approximation is tight under this approach. We also give an O(r3)-approximation algorithm for Kr, r-minor-free graphs. On the other hand, we show that the problem is equivalent to minimum multicut, and therefore APX-hard and difficult to approximate better than Θ(log n).