A trigram statistical language model algorithm for Chinese word segmentation

  • Authors:
  • Jun Mao;Gang Cheng;Yanxiang He;Zehuan Xing

  • Affiliations:
  • Computer School, Wuhan University, Wuhan, P. R. China;Computer School, Wuhan University, Wuhan, P. R. China;Computer School, Wuhan University, Wuhan, P. R. China;Department of Linguistics, Central China Normal University, Wuhan, P. R. China

  • Venue:
  • FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of segmenting a Chinese text into words. In this paper, we propose a trigram model algorithm for segmenting a Chinese text. We also discuss why statistical language model is appropriate to be applied to Chinese word segmentation and give an algorithm for segmenting a Chinese text into words. In particular, we solve the problem of searching which often leads to low performance brought by trigram model. Finally, the issue of OOV word identification is discussed and merged to trigram model based method in order to improve the accuracy of segmentation.