Chinese Named Entity Recognition combining a statistical model with human knowledge

  • Authors:
  • Youzheng Wu;Jun Zhao;Bo Xu

  • Affiliations:
  • Institute of Automation Chinese Academy of Sciences, Beijing, China;Institute of Automation Chinese Academy of Sciences, Beijing, China;Institute of Automation Chinese Academy of Sciences, Beijing, China

  • Venue:
  • MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named Entity Recognition is one of the key techniques in the fields of natural language processing, information retrieval, question answering and so on. Unfortunately, Chinese Named Entity Recognition (NER) is more difficult for the lack of capitalization information and the uncertainty in word segmentation. In this paper, we present a hybrid algorithm which can combine a class-based statistical model with various types of human knowledge very well. In order to avoid data sparseness problem, we employ a back-off model and [Abstract contained text which could not be captured.], a Chinese thesaurus, to smooth the parameters in the model. The F-measure of person names, location names, and organization names on the newswire test data for the 1999 IEER evaluation in Mandarin is 86.84%, 84.40% and 76.22% respectively.