Correlation Clustering

Authors:
Nikhil Bansal;Avrim Blum;Shuchi Chawla
Affiliations:
-;-;-
Venue:
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Year:
2002

Citing 0
Cited 57

Efficient location area planning for personal communication systems

Proceedings of the 9th annual international conference on Mobile computing and networking
Correlation Clustering: maximizing agreements via semidefinite programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
A k-Median Algorithm with Running Time Independent of Data Size

Machine Learning
A New Conceptual Clustering Framework

Machine Learning
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
An integrated, conditional model of information extraction and coreference with application to citation matching

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Cluster graph modification problems

Discrete Applied Mathematics - Discrete mathematics & data mining (DM & DM)
Clustering Aggregation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Quadratic forms on graphs

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
A divide-and-merge methodology for clustering

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the use of linear programming for unsupervised text classification

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Fitting tree metrics: Hierarchical clustering and Phylogeny

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Error bounds for correlation clustering

ICML '05 Proceedings of the 22nd international conference on Machine learning
Semi-supervised graph clustering: a kernel approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
Clustering with qualitative information

Journal of Computer and System Sciences - Special issue: Learning theory 2003
Correlation clustering in general weighted graphs

Theoretical Computer Science - Approximation and online algorithms
A divide-and-merge methodology for clustering

ACM Transactions on Database Systems (TODS)
Computing phylogenetic roots with bounded degrees and errors is NP-complete

Theoretical Computer Science - Computing and combinatorics
Machine learning for coreference resolution: from local classification to global ranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
The complexity of non-hierarchical clustering with instance and cluster level constraints

Data Mining and Knowledge Discovery
Supervised clustering of streaming data for email batch detection

Proceedings of the 24th international conference on Machine learning
A rigorous analysis of population stratification with limited data

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Community Mining from Signed Social Networks

IEEE Transactions on Knowledge and Data Engineering
On the Approximation of Correlation Clustering and Consensus Clustering

Journal of Computer and System Sciences
Clustering with Partial Information

MFCS '08 Proceedings of the 33rd international symposium on Mathematical Foundations of Computer Science
Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems
Semi-supervised graph clustering: a kernel approach

Machine Learning
Information Extraction

Foundations and Trends in Databases
Efficient top-k count queries over imprecise duplicates

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Swoosh: a generic approach to entity resolution

The VLDB Journal — The International Journal on Very Large Data Bases
Solution stability in linear programming relaxations: graph partitioning and unsupervised learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Exploiting context analysis for combining multiple entity resolution systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Entity resolution with iterative blocking

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Unsupervised and semi-supervised multi-class support vector machines

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Improving author coreference by resource-bounded information gathering from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hardness of edge-modification problems

Theoretical Computer Science
Cluster graph modification problems

Discrete Applied Mathematics
Clustering with partial information

Theoretical Computer Science
Resource-bounded information gathering for correlation clustering

COLT'07 Proceedings of the 20th annual conference on Learning theory
Supervised noun phrase coreference research: the first fifteen years

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A polygon-based methodology for mining related spatial datasets

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Data Mining for Geoinformatics
Scalable clustering of news search results

Proceedings of the fourth ACM international conference on Web search and data mining
Context-based entity description rule for entity resolution

Proceedings of the 20th ACM international conference on Information and knowledge management
Clustering causal relationships in genes expression data

WIRN'05 Proceedings of the 16th Italian conference on Neural Nets
Connectivity is not a limit for kernelization: planar connected dominating set

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Approximating the best-fit tree under Lp norms

APPROX'05/RANDOM'05 Proceedings of the 8th international workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th international conference on Randamization and Computation: algorithms and techniques
Inference protocols for coreference resolution

CONLL Shared Task '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
On the fixed-parameter enumerability of cluster editing

WG'05 Proceedings of the 31st international conference on Graph-Theoretic Concepts in Computer Science
Error compensation in leaf root problems

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Exploration of coreference resolution: the ACE entity detection and recognition task

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
A complete characterization of statistical query learning with applications to evolvability

Journal of Computer and System Sciences
Cluster ensembles

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Efficient parameterized preprocessing for cluster editing

FCT'07 Proceedings of the 16th international conference on Fundamentals of Computation Theory
Extracting query facets from search results

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Bootstrapping active name disambiguation with crowdsourcing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Topic segmentation and labeling in asynchronous conversations

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u; v) is labeled either + or - depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of - edges between clusters (equivalently, minimizes the number of disagreements: the number of - edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of "agnostic learning" problem.An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter, as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between 1 and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS. We also show how to extend some of these results to graphs with edge labelsin [-1, +1] and give some results for the case of random noise.