A study on word-based and integral-bit Chinese text compression algorithms
Journal of the American Society for Information Science
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Phrase-based statistical language modeling from bilingual parallel corpus
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
A minimum cluster-based trigram statistical model for Thai syllabification
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Hi-index | 0.00 |
We address the problem of segmenting a Chinese text into words. In this paper, we propose a trigram model algorithm for segmenting a Chinese text. We also discuss why statistical language model is appropriate to be applied to Chinese word segmentation and give an algorithm for segmenting a Chinese text into words. In particular, we solve the problem of searching which often leads to low performance brought by trigram model. Finally, the issue of OOV word identification is discussed and merged to trigram model based method in order to improve the accuracy of segmentation.