Accelerating the annotation of sparse named entities by dynamic sentence selection

  • Authors:
  • Yoshimasa Tsuruoka;Jun'ichi Tsujii;Sophia Ananiadou

  • Affiliations:
  • The University of Manchester, UK;The University of Manchester, UK and The University of Tokyo, Japan and National Centre for Text Mining (NaCTeM), Manchester, UK;The University of Manchester, UK and National Centre for Text Mining (NaCTeM), Manchester, UK

  • Venue:
  • BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an active learning-like framework for reducing the human effort for making named entity annotations in a corpus. In this framework, the annotation work is performed as an iterative and interactive process between the human annotator and a probabilistic named entity tagger. At each iteration, sentences that are most likely to contain named entities of the target category are selected by the probabilistic tagger and presented to the annotator. This iterative annotation process is repeated until the estimated coverage reaches the desired level. Unlike active learning approaches, our framework produces a named entity corpus that is free from the sampling bias introduced by the active strategy. We evaluated our framework by simulating the annotation process using two named entity corpora and show that our approach could drastically reduce the number of sentences to be annotated when applied to sparse named entities.