Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
Building a large-scale annotated Chinese corpus
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
Natural Language Engineering
Chinese lexical analysis using hierarchical hidden Markov model
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese unknown word identification using class-based LM
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Revising word lattice using support vector machine for Chinese word segmentation
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
In this paper we report results of a supervised machine-learning approach to Chinese word segmentation. First, a maximum entropy tagger is trained on manually annotated data to automatically labels the characters with tags that indicate the position of character within a word. An error-driven transformation-based tagger is then trained to clean up the tagging inconsistencies of the first tagger. The tagged output is then converted into segmented text. The preliminary results show that this approach is competitive compared with other supervised machine-learning segmenters reported in previous studies.