Correlation Clustering

Authors:
Nikhil Bansal;Avrim Blum;Shuchi Chawla
Affiliations:
Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA. nikhil@cs.cmu.edu;Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA. avrim@cs.cmu.edu;Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA. shuchi@cs.cmu.edu
Venue:
Machine Learning
Year:
2004

Citing 18
Cited 124

A unified approach to approximation algorithms for bottleneck problems

Journal of the ACM (JACM)
Toward Efficient Agnostic Learning

Machine Learning - Special issue on computational learning theory, COLT'92
MAX-CUT has a randomized approximation scheme in dense graphs

Random Structures & Algorithms
An Õ(n3/14)-coloring algorithm for 3-colorable graphs

Information Processing Letters
Property testing and its connection to learning and approximation

Journal of the ACM (JACM)
Efficient noise-tolerant learning from statistical queries

Journal of the ACM (JACM)
Polynomial time approximation schemes for dense instances of NP -hard problems

Journal of Computer and System Sciences
Clustering for edge-cost minimization (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Algorithms for graph partitioning on the planted partition model

Random Structures & Algorithms
Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation

Journal of the ACM (JACM)
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
Testing the diameter of graphs

Random Structures & Algorithms
Improved Algorithms for the Random Cluster Graph Model

SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Spectral Partitioning of Random Graphs

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Clustering with Qualitative Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Correlation Clustering: maximizing agreements via semidefinite programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms

Aggregating inconsistent information: ranking and clustering

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
On Non-Approximability for Quadratic Programs

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Supervised clustering with support vector machines

ICML '05 Proceedings of the 22nd international conference on Machine learning
Error bounds for correlation clustering

ICML '05 Proceedings of the 22nd international conference on Machine learning
Correlation clustering with a fixed number of clusters

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Clustering with qualitative information

Journal of Computer and System Sciences - Special issue: Learning theory 2003
On the approximability of maximum and minimum edge clique partition problems

CATS '06 Proceedings of the 12th Computing: The Australasian Theroy Symposium - Volume 51
Centralized and Distributed Multi-view Correspondence

International Journal of Computer Vision
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Clustering aggregation

ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient location area planning for personal communication systems

IEEE/ACM Transactions on Networking (TON)
Leveraging aggregate constraints for deduplication

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Community Mining from Signed Social Networks

IEEE Transactions on Knowledge and Data Engineering
Seeking stable clusters in the blogosphere

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Approximate clustering of incomplete fingerprints

Journal of Discrete Algorithms
An optimal sdp algorithm for max-cut, and equally optimal long code tests

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Approximation algorithms for co-clustering

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Spectral clustering with inconsistent advice

Proceedings of the 25th international conference on Machine learning
Aggregating inconsistent information: Ranking and clustering

Journal of the ACM (JACM)
A note on the inapproximability of correlation clustering

Information Processing Letters
A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Clustering with Partial Information

MFCS '08 Proceedings of the 33rd international symposium on Mathematical Foundations of Computer Science
A Local-Search 2-Approximation for 2-Correlation-Clustering

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Collaborative partitioning with maximum user satisfaction

Proceedings of the 17th ACM conference on Information and knowledge management
Closest 4-leaf power is fixed-parameter tractable

Discrete Applied Mathematics
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
A more effective linear kernelization for cluster editing

Theoretical Computer Science
On the approximability of the Maximum Agreement SubTree and Maximum Compatible Tree problems

Discrete Applied Mathematics
Ranking tournaments: Local search and a new algorithm

Journal of Experimental Algorithmics (JEA)
An online blog reading system by topic clustering and personalized ranking

ACM Transactions on Internet Technology (TOIT)
Fixed-Parameter Algorithms for Graph-Modeled Date Clustering

TAMC '09 Proceedings of the 6th Annual Conference on Theory and Applications of Models of Computation
A More Relaxed Model for Graph-Based Data Clustering: s-Plex Editing

AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Iterative Compression for Exactly Solving NP-Hard Minimization Problems

Algorithmics of Large and Complex Networks
Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Deterministic Pivoting Algorithms for Constrained Ranking and Clustering Problems

Mathematics of Operations Research
Graph-Based Data Clustering with Overlaps

COCOON '09 Proceedings of the 15th Annual International Conference on Computing and Combinatorics
An Evidence Accumulation Approach to Constrained Clustering Combination

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Get out the vote: determining support or opposition from congressional floor-debate transcripts

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Learning field compatibilities to extract database records from unstructured text

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Bounding and comparing methods for correlation clustering beyond ILP

ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Constant ratio fixed-parameter approximation of the edge multicut problem

Information Processing Letters
Correlation clustering for crosslingual link detection

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Practical Markov logic containing first-order quantifiers with application to identity uncertainty

CHSLP '06 Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Creating probabilistic databases from duplicated data

The VLDB Journal — The International Journal on Very Large Data Bases
Automobile, car and BMW: horizontal and hierarchical approach in social tagging systems

Proceedings of the 2nd ACM workshop on Social web search and mining
A graph-theoretical clustering method based on two rounds of minimum spanning trees

Pattern Recognition
Metric learning for semi-supervised clustering using pairwise constraints and the geometrical structure of data

Intelligent Data Analysis
A multiple-perspective approach to constructing and aggregating Citation Semantic Link Network

Future Generation Computer Systems
Framework for evaluating clustering algorithms in duplicate detection

Proceedings of the VLDB Endowment
Editing Graphs into Disjoint Unions of Dense Clusters

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Clustering with partial information

Theoretical Computer Science
Structural inference of hierarchies in networks

ICML'06 Proceedings of the 2006 conference on Statistical network analysis
Clustering query refinements by user intent

Proceedings of the 19th international conference on World wide web
Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data

Pattern Recognition
Inapproximability of maximum weighted edge biclique and its applications

TAMC'08 Proceedings of the 5th international conference on Theory and applications of models of computation
Improved algorithms for bicluster editing

TAMC'08 Proceedings of the 5th international conference on Theory and applications of models of computation
Probabilistic structured predictors

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values

Journal of Biomedical Informatics
The UGC Hardness Threshold of the Lp Grothendieck Problem

Mathematics of Operations Research
Untangling the cross-lingual link structure of Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improved consensus clustering via linear programming

ACSC '10 Proceedings of the Thirty-Third Australasian Conferenc on Computer Science - Volume 102
Correlation clustering with noisy input

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
A polynomial time approximation scheme for k-consensus clustering

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Clustering with diversity

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Record linkage with uniqueness constraints and erroneous values

Proceedings of the VLDB Endowment
Towards a more discriminative and semantic visual vocabulary

Computer Vision and Image Understanding
Alternative parameterizations for cluster editing

SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science
Disentangling chat

Computational Linguistics
Automatic discovery of attributes in relational databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fixed-parameter tractability of multicut parameterized by the size of the cutset

Proceedings of the forty-third annual ACM symposium on Theory of computing
PLINI: a probabilistic logic program framework for inconsistent news information

Logic programming, knowledge representation, and nonmonotonic reasoning
Visual word aggregation

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Clustering with local restrictions

ICALP'11 Proceedings of the 38th international colloquim conference on Automata, languages and programming - Volume Part I
Can everybody sit closer to their friends than their enemies?

MFCS'11 Proceedings of the 36th international conference on Mathematical foundations of computer science
The minimum transfer cost principle for model-order selection

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Improved approximation algorithms for bipartite correlation clustering

ESA'11 Proceedings of the 19th European conference on Algorithms
A 2k kernel for the cluster editing problem

Journal of Computer and System Sciences
Methodological Review: Coreference resolution: A review of general methodologies and applications in the clinical domain

Journal of Biomedical Informatics
A More Relaxed Model for Graph-Based Data Clustering: $s$-Plex Cluster Editing

SIAM Journal on Discrete Mathematics
Fitting Tree Metrics: Hierarchical Clustering and Phylogeny

SIAM Journal on Computing
Convergence and approximation in potential games

STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
Exploiting Web querying for Web people search

ACM Transactions on Database Systems (TODS)
Quality-aware similarity assessment for entity matching in Web data

Information Systems
On the NP-Completeness of some graph cluster measures

SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
Overcoming browser cookie churn with clustering

Proceedings of the fifth ACM international conference on Web search and data mining
Correlation clustering and consensus clustering

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Fixed-parameter tractable generalizations of cluster editing

CIAC'06 Proceedings of the 6th Italian conference on Algorithms and Complexity
Information distances over clusters

ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part I
Pruning training samples using a supervised clustering algorithm

ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part II
A randomized PTAS for the minimum Consensus Clustering with a fixed number of clusters

Theoretical Computer Science
On the parameterized complexity of consensus clustering

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Convergence and approximation in potential games

Theoretical Computer Science
Optimal partitions in additively separable hedonic games

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Survey: Graph clustering

Computer Science Review
Graph-based data clustering with overlaps

Discrete Optimization
Hedonic clustering games

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Visualization of Global Correlation Structures in Uncertain 2D Scalar Fields

Computer Graphics Forum
Subspace correlation clustering: finding locally correlated dimensions in subspace projections of the data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Discriminative clustering for market segmentation

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Chromatic correlation clustering

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
On editing graphs into 2-club clusters

FAW-AAIM'12 Proceedings of the 6th international Frontiers in Algorithmics, and Proceedings of the 8th international conference on Algorithmic Aspects in Information and Management
Dynamic reconfiguration in modular robots using graph partitioning-based coalitions

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Cluster editing with locally bounded modifications

Discrete Applied Mathematics
Detecting subgroups in online discussions by modeling positive and negative relations among participants

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Scalable clustering of signed networks using balance normalized cut

Proceedings of the 21st ACM international conference on Information and knowledge management
Routing state distance: a path-based metric for network analysis

Proceedings of the 2012 ACM conference on Internet measurement conference
A more effective linear kernelization for Cluster Editing

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
On the complexity of Newman's community finding approach for biological and social networks

Journal of Computer and System Sciences
Globally optimal closed-surface segmentation for connectomics

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Fast planar correlation clustering for image segmentation

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Clustering with local restrictions

Information and Computation
A machine learning approach for instance matching based on similarity metrics

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Computing desirable partitions in additively separable hedonic games

Artificial Intelligence
Coalition structure generation over graphs

Journal of Artificial Intelligence Research
Clustering under approximation stability

Journal of the ACM (JACM)
Leveraging transitive relations for crowdsourced joins

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A power-driven thermal sensor placement algorithm for dynamic thermal management

Proceedings of the Conference on Design, Automation and Test in Europe
Finding small separators in linear time via treewidth reduction

ACM Transactions on Algorithms (TALG)
On the approximability of maximum and minimum edge clique partition problems

CATS '06 Proceedings of the Twelfth Computing: The Australasian Theory Symposium - Volume 51
Correlation clustering with stochastic labellings

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
Break and conquer: efficient correlation clustering for image segmentation

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
Question selection for crowd entity resolution

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or − depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of − edges between clusters (equivalently, minimizes the number of disagreements: the number of − edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of “agnostic learning” problem.An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter, as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between 1 and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS, building on ideas of Goldreich, Goldwasser, and Ron (1998) and de la Veg (1996). We also show how to extend some of these results to graphs with edge labels in [−1, +1], and give some results for the case of random noise.