Class-based n-gram models of natural language
Computational Linguistics
Similarity-based approaches to natural language processing
Similarity-based approaches to natural language processing
Large margin classification using the perceptron algorithm
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning to resolve natural language ambiguities: a unified approach
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Feature vector quality and distributional similarity
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Characterising measures of lexical distributional similarity
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Identification and tracing of ambiguous names: discriminative and generative approaches
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Shallow semantics for coreference resolution
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Exploration of coreference resolution: the ACE entity detection and recognition task
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Hi-index | 0.00 |
Clustering is an optimization procedure that partitions a set of elements to optimize some criteria, based on a fixed distance metric defined between the elements. Clustering approaches have been widely applied in natural language processing and it has been shown repeatedly that their success depends on defining a good distance metric, one that is appropriate for the task and the clustering algorithm used. This paper develops a framework in which clustering is viewed as a learning task, and proposes a way to train a distance metric that is appropriate for the chosen clustering algorithm in the context of the given task. Experiments in the context of the entity identification problem exhibit significant performance improvements over state-of-the-art clustering approaches developed for this problem.