Multi-criteria-based strategy to stop active learning for data annotation

  • Authors:
  • Jingbo Zhu;Huizhen Wang;Eduard Hovy

  • Affiliations:
  • Northeastern University, Shenyang, Liaoning, P.R.China;Northeastern University, Shenyang, Liaoning, P.R.China;University of Southern California, Marina del Rey, CA

  • Venue:
  • COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address the issue of deciding when to stop active learning for building a labeled training corpus. Firstly, this paper presents a new stopping criterion, classification-change, which considers the potential ability of each unlabeled example on changing decision boundaries. Secondly, a multi-criteria-based combination strategy is proposed to solve the problem of predefining an appropriate threshold for each confidence-based stopping criterion, such as max-confidence, min-error, and overall-uncertainty. Finally, we examine the effectiveness of these stopping criteria on uncertainty sampling and heterogeneous uncertainty sampling for active learning. Experimental results show that these stopping criteria work well on evaluation data sets, and the combination strategies outperform individual criteria.