The web is not a person, Berners-Lee is not an organization, and African-Americans are not locations: an analysis of the performance of named-entity recognition

Authors:
Robert Krovetz;Paul Deane;Nitin Madnani
Affiliations:
Lexical Research Hillsborough, NJ;Educational Testing Service Princeton, NJ;Educational Testing Service Princeton, NJ
Venue:
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Year:
2011

Citing 10
Cited 0

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Named entity recognition for Catalan using Spanish resources

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language
Language independent NER using a maximum entropy tagger

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Adapting an NER-system for German to the biomedical domain

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Structured generative models for unsupervised named-entity clustering

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares with another on the major classes: PERSON, ORGANIZATION, and LOCATION. We report on a comparison of three state-of-the-art named entity taggers: Stanford, LBJ, and IdentiFinder. The taggers were compared with respect to: 1) Agreement rate on the classification of entities by class, and 2) Percentage of ambiguous entities (belonging to more than one class) co-occurring in a document. We found that the agreement between the taggers ranged from 34% to 58%, depending on the class and that more than 40% of the globally ambiguous entities co-occur within the same document. We also propose a unit test based on the problems we encountered.