A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Model-based Clustering with Soft Balancing
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A machine learning approach to coreference resolution of noun phrases
Computational Linguistics - Special issue on computational anaphora resolution
Machine Learning
Improving machine learning approaches to coreference resolution
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Error bounds for correlation clustering
ICML '05 Proceedings of the 22nd international conference on Machine learning
Clustering with qualitative information
Journal of Computer and System Sciences - Special issue: Learning theory 2003
Thread detection in dynamic text message streams
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Correlation clustering in general weighted graphs
Theoretical Computer Science - Approximation and online algorithms
ACM Transactions on Knowledge Discovery from Data (TKDD)
Minimum cut model for spoken lecture segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Aggregation via set partitioning for natural language generation
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Comparing clusterings---an information based distance
Journal of Multivariate Analysis
Aggregating inconsistent information: Ranking and clustering
Journal of the ACM (JACM)
Enforcing transitivity in coreference resolution
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Context-based message expansion for disentanglement of interleaved text conversations
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Correlation clustering with noisy input
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Computational Linguistics
Plans toward automated chat summarization
WASDGML '11 Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
A power-driven thermal sensor placement algorithm for dynamic thermal management
Proceedings of the Conference on Design, Automation and Test in Europe
Correlation clustering with stochastic labellings
SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
Hi-index | 0.00 |
We evaluate several heuristic solvers for correlation clustering, the NP-hard problem of partitioning a dataset given pairwise affinities between all points. We experiment on two practical tasks, document clustering and chat disentanglement, to which ILP does not scale. On these datasets, we show that the clustering objective often, but not always, correlates with external metrics, and that local search always improves over greedy solutions. We use semi-definite programming (SDP) to provide a tighter bound, showing that simple algorithms are already close to optimality.