Semi-automated named entity annotation

  • Authors:
  • Kuzman Ganchev;Fernando Pereira;Mark Mandel;Steven Carroll;Peter White

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;Children's Hospital of Philadelphia, Philadelphia, PA;Children's Hospital of Philadelphia, Philadelphia, PA

  • Venue:
  • LAW '07 Proceedings of the Linguistic Annotation Workshop
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate a way to partially automate corpus annotation for named entity recognition, by requiring only binary decisions from an annotator. Our approach is based on a linear sequence model trained using a k-best MIRA learning algorithm. We ask an annotator to decide whether each mention produced by a high recall tagger is a true mention or a false positive. We conclude that our approach can reduce the effort of extending a seed training corpus by up to 58%.