Single character Chinese named entity recognition

  • Authors:
  • Xiaodan Zhu;Mu Li;Jianfeng Gao;Chang-Ning Huang

  • Affiliations:
  • Microsoft Research, Asia, Beijing, China;Microsoft Research, Asia, Beijing, China;Microsoft Research, Asia, Beijing, China;Microsoft Research, Asia, Beijing, China

  • Venue:
  • SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Single character named entity (SCNE) is a name entity (NE) composed of one Chinese character, such as "[Abstract contained text which could not be captured.]" (zhong1, China) and "[Abstract contained text which could not be captured.]" (e2, Russia). SCNE is very common in written Chinese text. However, due to the lack of in-depth research, SCNE is a major source of errors in named entity recognition (NER). This paper formulates the SCNE recognition within the source-channel model framework. Our experiments show very encouraging results: an F-score of 81.01% for single character location name recognition, and an F-score of 68.02% for single character person name recognition. An alternative view of the SCNE recognition problem is to formulate it as a classification task. We construct two classifiers based on maximum entropy model (ME) and vector space model (VSM), respectively. We compare all proposed approaches, showing that the source-channel model performs the best in most cases.