A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Critical tokenization and its properties
Computational Linguistics
Chinese Word Segmentation for Terrorism-Related Contents
PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Domain-specific Chinese word segmentation using suffix tree and mutual information
Information Systems Frontiers
New perspectives in sinographic language processing through the use of character structure
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
The segmentation of Chinese texts is a key process in Chinese information processing. The difficulties in segmentation are the process of ambiguous character string and unknown Chinese words. In order to obtain the correct result, the first is identification of all possible candidates of Chinese words in a text. In this paper, a data structure Chinese-character-net is put forward, then, based on this character-net, a new algorithm is presented to obtain all possible candidate of Chinese words in a text. This paper gives the experiment result. Finally the characteristics of the algorithm are analysed.