Reducing class imbalance during active learning for named entity annotation

  • Authors:
  • Katrin Tomanek;Udo Hahn

  • Affiliations:
  • Friedrich-Schiller-Universität Jena, Jena, Germany;Friedrich-Schiller-Universität Jena, Jena, Germany

  • Venue:
  • Proceedings of the fifth international conference on Knowledge capture
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In lots of natural language processing tasks, the classes to be dealt with often occur heavily imbalanced in the underlying data set and classifiers trained on such skewed data tend to exhibit poor performance for low-frequency classes. We introduce and compare different approaches to reduce class imbalance by design within the context of active learning (AL). Our goal is to compile more balanced data sets up front during annotation time when AL is used as a strategy to acquire training material. We situate our approach in the context of named entity recognition. Our experiments reveal that we can indeed reduce class imbalance and increase the performance of classifiers on minority classes while preserving a good overall performance in terms of macro F-score.