The web is not a person, Berners-Lee is not an organization, and African-Americans are not locations: an analysis of the performance of named-entity recognition

  • Authors:
  • Robert Krovetz;Paul Deane;Nitin Madnani

  • Affiliations:
  • Lexical Research Hillsborough, NJ;Educational Testing Service Princeton, NJ;Educational Testing Service Princeton, NJ

  • Venue:
  • MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares with another on the major classes: PERSON, ORGANIZATION, and LOCATION. We report on a comparison of three state-of-the-art named entity taggers: Stanford, LBJ, and IdentiFinder. The taggers were compared with respect to: 1) Agreement rate on the classification of entities by class, and 2) Percentage of ambiguous entities (belonging to more than one class) co-occurring in a document. We found that the agreement between the taggers ranged from 34% to 58%, depending on the class and that more than 40% of the globally ambiguous entities co-occur within the same document. We also propose a unit test based on the problems we encountered.