Combining classifiers for Chinese word segmentation

Authors:
Nianwen Xue;Susan P. Converse
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Year:
2002

Citing 3
Cited 4

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A compression-based algorithm for Chinese word segmentation

Computational Linguistics
Building a large-scale annotated Chinese corpus

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
Chinese lexical analysis using hierarchical hidden Markov model

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese unknown word identification using class-based LM

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Revising word lattice using support vector machine for Chinese word segmentation

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we report results of a supervised machine-learning approach to Chinese word segmentation. First, a maximum entropy tagger is trained on manually annotated data to automatically labels the characters with tags that indicate the position of character within a word. An error-driven transformation-based tagger is then trained to clean up the tagging inconsistencies of the first tagger. The tagged output is then converted into segmented text. The preliminary results show that this approach is competitive compared with other supervised machine-learning segmenters reported in previous studies.