Efficient location area planning for personal communication systems
Proceedings of the 9th annual international conference on Mobile computing and networking
Correlation Clustering: maximizing agreements via semidefinite programming
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
A k-Median Algorithm with Running Time Independent of Data Size
Machine Learning
A New Conceptual Clustering Framework
Machine Learning
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Cluster graph modification problems
Discrete Applied Mathematics - Discrete mathematics & data mining (DM & DM)
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
A divide-and-merge methodology for clustering
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the use of linear programming for unsupervised text classification
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Fitting tree metrics: Hierarchical clustering and Phylogeny
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Error bounds for correlation clustering
ICML '05 Proceedings of the 22nd international conference on Machine learning
Semi-supervised graph clustering: a kernel approach
ICML '05 Proceedings of the 22nd international conference on Machine learning
Clustering with qualitative information
Journal of Computer and System Sciences - Special issue: Learning theory 2003
Correlation clustering in general weighted graphs
Theoretical Computer Science - Approximation and online algorithms
A divide-and-merge methodology for clustering
ACM Transactions on Database Systems (TODS)
Computing phylogenetic roots with bounded degrees and errors is NP-complete
Theoretical Computer Science - Computing and combinatorics
Machine learning for coreference resolution: from local classification to global ranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
The complexity of non-hierarchical clustering with instance and cluster level constraints
Data Mining and Knowledge Discovery
Supervised clustering of streaming data for email batch detection
Proceedings of the 24th international conference on Machine learning
A rigorous analysis of population stratification with limited data
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Community Mining from Signed Social Networks
IEEE Transactions on Knowledge and Data Engineering
On the Approximation of Correlation Clustering and Consensus Clustering
Journal of Computer and System Sciences
Clustering with Partial Information
MFCS '08 Proceedings of the 33rd international symposium on Mathematical Foundations of Computer Science
Non-negative matrix factorization for semi-supervised data clustering
Knowledge and Information Systems
Semi-supervised graph clustering: a kernel approach
Machine Learning
Foundations and Trends in Databases
Efficient top-k count queries over imprecise duplicates
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
Solution stability in linear programming relaxations: graph partitioning and unsupervised learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Exploiting context analysis for combining multiple entity resolution systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Unsupervised and semi-supervised multi-class support vector machines
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Improving author coreference by resource-bounded information gathering from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hardness of edge-modification problems
Theoretical Computer Science
Cluster graph modification problems
Discrete Applied Mathematics
Clustering with partial information
Theoretical Computer Science
Resource-bounded information gathering for correlation clustering
COLT'07 Proceedings of the 20th annual conference on Learning theory
Supervised noun phrase coreference research: the first fifteen years
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A polygon-based methodology for mining related spatial datasets
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Data Mining for Geoinformatics
Scalable clustering of news search results
Proceedings of the fourth ACM international conference on Web search and data mining
Context-based entity description rule for entity resolution
Proceedings of the 20th ACM international conference on Information and knowledge management
Clustering causal relationships in genes expression data
WIRN'05 Proceedings of the 16th Italian conference on Neural Nets
Connectivity is not a limit for kernelization: planar connected dominating set
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Approximating the best-fit tree under Lp norms
APPROX'05/RANDOM'05 Proceedings of the 8th international workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th international conference on Randamization and Computation: algorithms and techniques
Inference protocols for coreference resolution
CONLL Shared Task '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
On the fixed-parameter enumerability of cluster editing
WG'05 Proceedings of the 31st international conference on Graph-Theoretic Concepts in Computer Science
Error compensation in leaf root problems
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Exploration of coreference resolution: the ACE entity detection and recognition task
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
A complete characterization of statistical query learning with applications to evolvability
Journal of Computer and System Sciences
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Efficient parameterized preprocessing for cluster editing
FCT'07 Proceedings of the 16th international conference on Fundamentals of Computation Theory
Extracting query facets from search results
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Bootstrapping active name disambiguation with crowdsourcing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Topic segmentation and labeling in asynchronous conversations
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u; v) is labeled either + or - depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of - edges between clusters (equivalently, minimizes the number of disagreements: the number of - edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of "agnostic learning" problem.An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter, as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between 1 and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS. We also show how to extend some of these results to graphs with edge labelsin [-1, +1] and give some results for the case of random noise.