CRF-based active learning for Chinese named entity recognition

  • Authors:
  • Lin Yao;Chengjie Sun;Shaofeng Li;Xiaolong Wang;Xuan Wang

  • Affiliations:
  • Computer Science Department, HITSGS, ShenZhen, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;Computer Science Department, HITSGS, ShenZhen, China;Computer Science Department, HITSGS, ShenZhen, China;Computer Science Department, HITSGS, ShenZhen, China

  • Venue:
  • SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Conditional Random Fields (CRFs) have been used for many sequence labeling tasks and got excellent results. Further, the supervised model strongly depends on the huge training data. Active learning is a different way rather than relying on a large amount random sampling. However, random sampling constructively participates in the optimal choosing training examples. Based on different query strategies, active learning can combine with other machine learning methods to reduce the annotation cost while maintaining the accuracy. This paper proposes a new active learning strategy based on Information Density (ID) integrated with CRFs for Chinese Named Entity Recognition (NER). On Sighan bakeoff 2006 MSRA NER corpus, an F1 score of 77.2% is achieved by using only 10,000 labeled training sentences chosen by the proposed active learning strategy.