Multi-criteria-based active learning for named entity recognition

  • Authors:
  • Dan Shen;Jie Zhang;Jian Su;Guodong Zhou;Chew-Lim Tan

  • Affiliations:
  • Institute for Infocomm Technology, Singapore and National University of Singapore, Singapore and Universität des Saarlandes, Germany;Institute for Infocomm Technology, Singapore and National University of Singapore, Singapore;Institute for Infocomm Technology, Singapore;Institute for Infocomm Technology, Singapore;National University of Singapore, Singapore

  • Venue:
  • ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a multi-criteria-based active learning approach and effectively apply it to named entity recognition. Active learning targets to minimize the human annotation efforts by selecting examples for labeling. To maximize the contribution of the selected examples, we consider the multiple criteria: informativeness, representativeness and diversity and propose measures to quantify them. More comprehensively, we incorporate all the criteria using two selection strategies, both of which result in less labeling cost than single-criterion-based method. The results of the named entity recognition in both MUC-6 and GENIA show that the labeling cost can be reduced by at least 80% without degrading the performance.