An unsupervised language independent method of name discrimination using second order co-occurrence features

Authors:
Ted Pedersen;Anagha Kulkarni;Roxana Angheluta;Zornitsa Kozareva;Thamar Solorio
Affiliations:
University of Minnesota, Duluth;University of Minnesota, Duluth;Katholieke Universiteit Leuven, Belgium;University of Alicante, Spain;University of Texas at El Paso
Venue:
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2006

Citing 7
Cited 11

Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
New Techniques for Disambiguation in Natural Language and Their Application to Biological Text

The Journal of Machine Learning Research
Discriminating among word senses using McQuitty's similarity analysis

NAACLstudent '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3
Category-based pseudowords

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Improved Unsupervised Name Discrimination with Very Wide Bigrams and Automatic Cluster Stopping

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Unsupervised Discrimination of Person Names in Web Contexts

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Classifying Biomedical Abstracts Using Committees of Classifiers and Collective Ranking Techniques

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Improving name discrimination: a language salad approach

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Multilingual name disambiguation with semantic information

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Domain information for fine-grained person name categorization

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Large-scale cross-document coreference using distributed inference and hierarchical models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised name ambiguity resolution using a generative model

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Offensive language detection using multi-level classification

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Computational Intelligence
Accurate unsupervised joint named-entity extraction from unaligned parallel text

NEWS '12 Proceedings of the 4th Named Entity Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order co–occurrence features. These contexts are then clustered in order to identify which are associated with different underlying named entities. It also extracts descriptive and discriminating bigrams from each of the discovered clusters in order to serve as identifying labels. These methods have been shown to perform well with English text, although we believe them to be language independent since they rely on lexical features and use no syntactic features or external knowledge sources. In this paper we apply this methodology in exactly the same way to Bulgarian, English, Romanian, and Spanish corpora. We find that it attains discrimination accuracy that is consistently well above that of a majority classifier, thus providing support for the hypothesis that the method is language independent.