Improving name discrimination: a language salad approach

Authors:
Ted Pedersen;Anagha Kulkarni;Zornitsa Kozareva;Roxana Angheluta;Thamar Solorio
Affiliations:
University of Minnesota, Duluth, MN;University of Minnesota, Duluth, MN;University of Alicante, Alicante, Spain;Attentio SA, Brussels, Belgium;University of Texas at El Paso, El Paso, TX
Venue:
CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Year:
2006

Citing 5
Cited 0

Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Category-based pseudowords

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Selecting the "right" number of senses based on clustering criterion functions

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
An unsupervised language independent method of name discrimination using second order co-occurrence features

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a method of discriminating ambiguous names that relies upon features found in corpora of a more abundant language. In particular, we discriminate ambiguous names in Bulgarian, Romanian, and Spanish corpora using information derived from much larger quantities of English data. We also mix together occurrences of the ambiguous name found in English with the occurrences of the name in the language in which we are trying to discriminate. We refer to this as a language salad, and find that it often results in even better performance than when only using English or the language itself as the source of information for discrimination.