Chinese named entity recognition with a sequence labeling approach: based on characters, or based on words?

  • Authors:
  • Zhangxun Liu;Conghui Zhu;Tiejun Zhao

  • Affiliations:
  • MOE-MS Key Laboratory of NLP and speech, Harbin Institute of Technology, Harbin, China;MOE-MS Key Laboratory of NLP and speech, Harbin Institute of Technology, Harbin, China;MOE-MS Key Laboratory of NLP and speech, Harbin Institute of Technology, Harbin, China

  • Venue:
  • ICIC'10 Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named Entity Recognition (NER), an important problem of Natural Language Processing, is the basis for other applications, such as Data Mining and Relation Extraction. With a sequence labeling approach, this paper wants to answer which kind of tokens that should be taken as the graininess in NER task, characters or words. Meanwhile, we use not only local context features within a sentence, but also global knowledge features extracting from other occurrences of each word in the whole corpus. The results show that without the global features the person names and the location names have good result based on characters, but the organization names are more suitable based on words. When global features are added, the performance of based on words improved significantly.