Learning to recognize names across languages

Authors:
Anthony F. Gallippi
Affiliations:
University of Southern California, University Park, Los Angeles, CA
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Year:
1996

Citing 7
Cited 2

An experiment in computational discrimination of English word senses

IBM Journal of Research and Development
Automated postediting of documents

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Emergent linguistic rules from inducing decision trees: disambiguating discourse clue words

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Corpus-driven knowledge acquisition for discourse analysis

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Induction of Decision Trees

Machine Learning
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
GE-CMU: description of the SHOGUN system used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding

A System for Recognition of Named Entities in Greek

NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Japanese named entity recognition based on a simple rule generator and decision tree learning

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The development of natural language processing (NLP) systems that perform machine translation (MT) and information retrieval (IR) has highlighted the need for the automatic recognition of proper names. While various name recognizers have been developed, they suffer from being too limited; some only recognize one name class, and all are language specific. This work develops an approach to multilingual name recognition that allows a system optimized for one language to be ported to another with little additional effort and resources. An initial core set of linguistic features, useful for name recognition in most languages, is identified. When porting to a new language, these features need to be converted (partly by hand, partly by on-line lists), after which point machine learning (ML) techniques build decision trees that map features to name classes. A system initially optimized for English has been successfully ported to Spanish and Japanese. Only a few days of human effort for each new language results in performance levels comparable to that of the best current English systems.