Information-Based Evaluation Criterion for Classifier's Performance
Machine Learning
Lexical ambiguity and information retrieval
ACM Transactions on Information Systems (TOIS)
Internal and external evidence in the identification and semantic categorization of proper names
Corpus processing for lexical acquisition
Identifying unknown proper names in newswire text
Corpus processing for lexical acquisition
Categorizing and standardizing proper nouns for efficient information retrieval
Corpus processing for lexical acquisition
Introduction to the special issue on word sense disambiguation: the state of the art
Computational Linguistics - Special issue on word sense disambiguation
Finding a domain-appropriate sense inventory for semantically tagging a corpus
Natural Language Engineering
A statistical profile of the Named Entity task
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Disambiguation of proper names in text
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Word sense disambiguation using optimised combinations of knowledge sources
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Word sense ambiguation: clustering related senses
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
A "not-so-shallow" parser for collocational analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
New York University: description of the PROTEUS system as used for MUC-4
MUC4 '92 Proceedings of the 4th conference on Message understanding
HLT '91 Proceedings of the workshop on Speech and Natural Language
Using text processing techniques to automatically enrich a domain ontology
Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing
Automatic feature thesaurus enrichment: extracting generic terms from digital gazetteer
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
In this paper, we describe a context-based method to semantically tag unknown proper nouns (U-PNs) in corpora. Like many others, our system relies on a gazetteer and a set of context-dependent heuristics to classify proper nouns. However, proper nouns are an open-end class: when parsing new fragments of a corpus, even in the same language domain, we can expect that several proper nouns cannot be semantically tagged. The algorithm that we propose assigns to an unknown PN an entity type based on the analysis of syntactically and semantically similar contexts already seen in the application corpus. The performance of the algorithm is evaluated not only in terms of precision, following the tradition of MUC conferences, but also in terms of information gain, an information theoretic measure that takes into account the complexity of the classification task.