Covering ambiguity resolution in Chinese word segmentation based on contextual information

  • Authors:
  • Xiao Luo;Maosong Sun;Benjamin K. Tsou

  • Affiliations:
  • Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;City University of Hong Kong, Hong Kong

  • Venue:
  • COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Covering ambiguity is one of the two basic types of ambiguities in Chinese word segmentation. We regard its resolution as equivalent to word sense disambiguation, and make use of the classical vector space model in information retrieval to formulate the contexts of ambiguous words. A variation form of TFIDF weighting is proposed and a Chinese thesaurus is additionally utilized to cope with data sparseness problem. We select 90 frequent cases of covering ambiguities as the target. The training set includes 77654 sentences, and the test set includes 19242 sentences. The experimental results showed that our model has achieved 96.58% accuracy, outperforming the original form of TFIDF weighting as well as another baseline model, the hidden Markov model.