A stochastic finite-state word-segmentation algorithm for Chinese
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Word identification for Mandarin Chinese sentences
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Recognizing unregistered names for Mandarin word identification
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Comparing representations in Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text retrieval without using a dictionary
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese word segmentation without using lexicon and hand-crafted training data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Covering ambiguity resolution in Chinese word segmentation based on contextual information
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Word extraction based on semantic constraints in chinese word-formation
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
Chinese word segmentation and POS tagging are two key techniques in many applications in Chinese information processing. Great efforts have been paid to the research in the last decade, but unfortunately, no practical system with high performance for unrestricted texts is available up to date. CSeg&Tag1.0, a Chinese word segmenter and POS tagger which unifies these two procedures into one model, is introduced in this paper. The preliminary open tests show that the segmentation precision of CSeg&Tag1.0 is about 98.0% - 99.3%, POS tagging precision about 91.0% - 97.1%, and the recall and precision for unknown words are ranging from 95.0% to 99.0% and from 87.6% to 95.3% respectively. The processing speed is about 100 characters per second on Pentium 133 PC. The work of improving the performance of the system is still ongoing.