Information-Based Evaluation Criterion for Classifier's Performance
Machine Learning
Internal and external evidence in the identification and semantic categorization of proper names
Corpus processing for lexical acquisition
Identifying unknown proper names in newswire text
Corpus processing for lexical acquisition
Categorizing and standardizing proper nouns for efficient information retrieval
Corpus processing for lexical acquisition
A statistical profile of the Named Entity task
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Disambiguation of proper names in text
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A "not-so-shallow" parser for collocational analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
New York University: description of the PROTEUS system as used for MUC-4
MUC4 '92 Proceedings of the 4th conference on Message understanding
HLT '91 Proceedings of the workshop on Speech and Natural Language
Named-Entity Recognition from Greek and English Texts
Journal of Intelligent and Robotic Systems
A Web Information Extraction System to DB Prototyping
NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Unsupervised named entity recognition using syntactic and semantic contextual evidence
Computational Linguistics
Named Entity recognition without gazetteers
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Using machine learning to maintain rule-based named-entity recognition and classification systems
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Dependency of context-based word sense disambiguation from representation and domain complexity
NAACL-ANLP-SSCNLPS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Syntactic and semantic complexity in natural language processing systems - Volume 1
A WordNet-based approach to Named Entities recognition
SEMANET '02 Proceedings of the 2002 workshop on Building and using semantic networks - Volume 11
Introduction to information extraction
AI Communications
Dependency of context-based word sense disambiguation from representation and domain complexity
NLPComplexity '00 NAACL-ANLP 2000 Workshop: Syntactic and Semantic Complexity in Natural Language Processing Systems
Spoken information extraction from Italian broadcast news
ECIR'03 Proceedings of the 25th European conference on IR research
Hi-index | 0.00 |
Implemented methods for proper names recognition rely on large gazetteers of common proper nouns and a set of heuristic rules (e.g. Mr. as an indicator of a PERSON entity type). Though the performance of current PN recognizers is very high (over 90%), it is important to note that this problem is by no means a "solved problem". Existing systems perform extremely well on newswire corpora by virtue of the availability of large gazetteers and rule bases designed for specific tasks (e.g. recognition of Organization and Person entity types as specified in recent Message Understanding Conferences MUC).However, large gazetteers are not available for most languages and applications other than newswire texts and, in any case, proper nouns are an open class.In this paper we describe a context-based method to assign an entity type to unknown proper names (PNs). Like many others, our system relies on a gazetteer and a set of context-dependent heuristics to classify proper nouns. However, due to the unavailability of large gazetteers in Italian, over 20% detected PNs cannot be semantically tagged.The algorithm that we propose assigns an entity type to an unknown PN based on the analysis of syntactically and semantically similar contexts already seen in the application corpus.The performance of the algorithm is evaluated not only in terms of precision, following the tradition of MUC conferences, but also in terms of Information Gain, an information theoretic measure that takes into account the complexity of the classification task.