Automatic semantic tagging of unknown proper names

Authors:
Alessandro Cucchiarelli;Danilo Luzi;Paola Velardi
Affiliations:
Università di Ancona, Istituto di Informatica, Ancona, Italia;Università di Ancona, Istituto di Informatica, Ancona, Italia;Università di Roma 'La Sapienza', Roma, Italia
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Year:
1998

Citing 11
Cited 10

Information-Based Evaluation Criterion for Classifier's Performance

Machine Learning
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Internal and external evidence in the identification and semantic categorization of proper names

Corpus processing for lexical acquisition
Identifying unknown proper names in newswire text

Corpus processing for lexical acquisition
Categorizing and standardizing proper nouns for efficient information retrieval

Corpus processing for lexical acquisition
A statistical profile of the Named Entity task

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Disambiguation of proper names in text

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A "not-so-shallow" parser for collocational analysis

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
New York University: description of the PROTEUS system as used for MUC-4

MUC4 '92 Proceedings of the 4th conference on Message understanding
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language

Named-Entity Recognition from Greek and English Texts

Journal of Intelligent and Robotic Systems
A Web Information Extraction System to DB Prototyping

NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Unsupervised named entity recognition using syntactic and semantic contextual evidence

Computational Linguistics
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Using machine learning to maintain rule-based named-entity recognition and classification systems

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Dependency of context-based word sense disambiguation from representation and domain complexity

NAACL-ANLP-SSCNLPS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Syntactic and semantic complexity in natural language processing systems - Volume 1
A WordNet-based approach to Named Entities recognition

SEMANET '02 Proceedings of the 2002 workshop on Building and using semantic networks - Volume 11
Introduction to information extraction

AI Communications
Dependency of context-based word sense disambiguation from representation and domain complexity

NLPComplexity '00 NAACL-ANLP 2000 Workshop: Syntactic and Semantic Complexity in Natural Language Processing Systems
Spoken information extraction from Italian broadcast news

ECIR'03 Proceedings of the 25th European conference on IR research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Implemented methods for proper names recognition rely on large gazetteers of common proper nouns and a set of heuristic rules (e.g. Mr. as an indicator of a PERSON entity type). Though the performance of current PN recognizers is very high (over 90%), it is important to note that this problem is by no means a "solved problem". Existing systems perform extremely well on newswire corpora by virtue of the availability of large gazetteers and rule bases designed for specific tasks (e.g. recognition of Organization and Person entity types as specified in recent Message Understanding Conferences MUC).However, large gazetteers are not available for most languages and applications other than newswire texts and, in any case, proper nouns are an open class.In this paper we describe a context-based method to assign an entity type to unknown proper names (PNs). Like many others, our system relies on a gazetteer and a set of context-dependent heuristics to classify proper nouns. However, due to the unavailability of large gazetteers in Italian, over 20% detected PNs cannot be semantically tagged.The algorithm that we propose assigns an entity type to an unknown PN based on the analysis of syntactically and semantically similar contexts already seen in the application corpus.The performance of the algorithm is evaluated not only in terms of precision, following the tradition of MUC conferences, but also in terms of Information Gain, an information theoretic measure that takes into account the complexity of the classification task.