Correlation clustering in general weighted graphs

Authors:
Erik D. Demaine;Dotan Emanuel;Amos Fiat;Nicole Immorlica
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Department of Computer Science, School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel;Department of Computer Science, School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA
Venue:
Theoretical Computer Science - Approximation and online algorithms
Year:
2006

Citing 19
Cited 22

A unified approach to approximation algorithms for bottleneck problems

Journal of the ACM (JACM)
Excluded minors, network decomposition, and multicommodity flow

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Improved bounds for the max-flow min-multicut ratio for planar and Kr,r-free graphs

Information Processing Letters
The Complexity of Multiterminal Cuts

SIAM Journal on Computing
Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms

Journal of the ACM (JACM)
Approximation algorithms

Approximation algorithms
Approximate Max-Flow Min-(Multi)Cut Theorems and Their Applications

SIAM Journal on Computing
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Correlation Clustering

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cutting and Partitioning a Graph aifter a Fixed Pattern (Extended Abstract)

Proceedings of the 10th Colloquium on Automata, Languages and Programming
Primal-Dual Approximation Algorithms for Metric Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Efficient location area planning for personal communication systems

Proceedings of the 9th annual international conference on Mobile computing and networking
Clustering with Qualitative Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Correlation Clustering: maximizing agreements via semidefinite programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Maximizing Quadratic Programs: Extending Grothendieck's Inequality

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
An experimental comparison of several clustering and initialization methods

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Clustering aggregation

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering with Partial Information

MFCS '08 Proceedings of the 33rd international symposium on Mathematical Foundations of Computer Science
Solution stability in linear programming relaxations: graph partitioning and unsupervised learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Bounding and comparing methods for correlation clustering beyond ILP

ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Constant ratio fixed-parameter approximation of the edge multicut problem

Information Processing Letters
Gesture improves coreference resolution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Creating probabilistic databases from duplicated data

The VLDB Journal — The International Journal on Very Large Data Bases
Framework for evaluating clustering algorithms in duplicate detection

Proceedings of the VLDB Endowment
Clustering with partial information

Theoretical Computer Science
Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values

Journal of Biomedical Informatics
Correlation clustering with noisy input

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
A polynomial time approximation scheme for k-consensus clustering

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
A sharp concentration-based adaptive segmentation algorithm

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part II
Towards a more discriminative and semantic visual vocabulary

Computer Vision and Image Understanding
Fixed-parameter tractability of multicut parameterized by the size of the cutset

Proceedings of the forty-third annual ACM symposium on Theory of computing
Improved approximation algorithms for bipartite correlation clustering

ESA'11 Proceedings of the 19th European conference on Algorithms
Overcoming browser cookie churn with clustering

Proceedings of the fifth ACM international conference on Web search and data mining
Globally optimal closed-surface segmentation for connectomics

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Fast planar correlation clustering for image segmentation

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Finding small separators in linear time via treewidth reduction

ACM Transactions on Algorithms (TALG)
Clustering and outlier detection using isoperimetric number of trees

Pattern Recognition
Correlation clustering with stochastic labellings

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the following general correlation-clustering problem [N. Bansal, A. Blum, S. Chawla, Correlation clustering, in: Proc. 43rd Annu. IEEE Symp. on Foundations of Computer Science, Vancouver, Canada, November 2002, pp. 238-250]: given a graph with real nonnegative edge weights and a 〈+〉/〈-〉 edge labelling, partition the vertices into clusters to minimize the total weight of cut 〈+〉 edges and uncut 〈-〉 edges. Thus, 〈+〉 edges with large weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster while 〈-〉 edges with large weights encourage the endpoints to belong to different clusters. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be best possible by the problem definition.Correlation clustering was introduced by Bansal et al. [Correlation clustering, in: Proc. 43rd Annu. IEEE Syrup. on Foundations of Computer Science, Vancouver, Canada, November 2002, pp. 238-250], motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has the same weight. We give an O(log n)-approximation algorithm for the general case based on a linear-programming rounding and the "region-growing" technique. We also prove that this linear program has a gap of Ω(log n), and therefore our approximation is tight under this approach. We also give an O(r3)-approximation algorithm for Kr, r-minor-free graphs. On the other hand, we show that the problem is equivalent to minimum multicut, and therefore APX-hard and difficult to approximate better than Θ(log n).