On proper unit selection in active learning: co-selection effects for named entity recognition

  • Authors:
  • Katrin Tomanek;Florian Laws;Udo Hahn;Hinrich Schütze

  • Affiliations:
  • Friedrich-Schiller-Universität Jena, Jena, Germany;Universität Stuttgart, Germany;Friedrich-Schiller-Universität Jena, Jena, Germany;Universität Stuttgart, Germany

  • Venue:
  • HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Active learning is an effective method for creating training sets cheaply, but it is a biased sampling process and fails to explore large regions of the instance space in many applications. This can result in a missed cluster effect, which signficantly lowers recall and slows down learning for infrequent classes. We show that missed clusters can be avoided in sequence classification tasks by using sentences as natural multi-instance units for labeling. Co-selection of other tokens within sentences provides an implicit exploratory component since we found for the task of named entity recognition on two corpora that entity classes co-occur with sufficient frequency within sentences.