Integration of Named Entity Information for Chinese Word Segmentation Based on Maximum Entropy

  • Authors:
  • Ka Seng Leong;Fai Wong;Yiping Li;Ming Chui Dong

  • Affiliations:
  • Faculty of Science and Technology of University of Macau, Taipa, China;Faculty of Science and Technology of University of Macau, Taipa, China;Faculty of Science and Technology of University of Macau, Taipa, China;Faculty of Science and Technology of University of Macau, Taipa, China

  • Venue:
  • ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word segmentation is an essential process in Chinese information processing. Although related researches were reported and made progresses, the Unknown Named Entity (UNE) problem in segmentation is not fully solved. This usually degrades the accuracy of segmentation in general. In this paper, a model to identify UNEs for improving the overall performance of the segmentation is presented. In order to capture the NE information, functions of characters or words are defined with tags. In addition, useful surrounding contexts are collected from a corpus and used as features. The model is constructed based on Maximum Entropy to handle the UNE identification as tagging problem. Empirical experiments show that the overall accuracy of the segmentation is improved after integrating the UNE identification module into the word segmenter.