Context-Based Approach for Covering Ambiguity Resolution in Chinese Word Segmentation

  • Authors:
  • Su-qin Feng;Su-qin Hou

  • Affiliations:
  • -;-

  • Venue:
  • ICIC '09 Proceedings of the 2009 Second International Conference on Information and Computing Science - Volume 02
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Covering ambiguity is a vital issue in Chinese word segmentation. Challenges are that disambiguation is depending on the contextual information. This paper collected contextual information statistics of covering ambiguity words and found a context calculation mode by using log likelihood ratio. A weighing calculation formula is designed for considering contextual information’s window size and location and the influence of frequency on covering ambiguity. Based on this, two methods are used for disambiguation. One is using the maximum log likelihood ratio in contextual information; the other is using the maximum numerical value of the sum of respective log likelihood ratio under the situation of combination or separation in contextual information. 14 frequently appeared covering ambiguous words are used as examples. The average accuracy of the former method reaches 84.93%, and that of the latter reaches 95.60 %. The result of the experiment reveals that using the combination of contextual information is effective for disambiguation.