Revising word lattice using support vector machine for Chinese word segmentation

Authors:
Ming Zhong;Sheng Wang;Ming Wu
Affiliations:
Wuhan University, Wuhan, China;Wuhan University, Wuhan, China;Zhongnan University of Economics and Law, Wuhan, China
Venue:
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Year:
2012

Citing 12
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Use of support vector learning for chunk identification

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Statistically-enhanced new word identification in a rule-based Chinese system

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Combining classifiers for Chinese word segmentation

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Unsupervised training for overlapping ambiguity resolution in Chinese word segmentation

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese lexical analysis using hierarchical hidden Markov model

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
The first international Chinese word segmentation Bakeoff

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation using minimal linguistic knowledge

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation in MSR-NLP

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
The use of SVM for chinese new word identification

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel Chinese word segmentation approach combining both dictionary-based and statistics-based techniques. First, we transform a linear sentence to a word lattice based on dictionary. Then we apply classification method based on support vector machine to conduct two main tasks: resolving segmentation ambiguities and recognizing out-of-vocabulary words. We determine the position in word of the current character by using some of its surrounding characters as features. Disambiguation and recognition result in pruning and appending edges in the word lattice. Lastly, we output the segmentation results by searching the shortest path in the word lattice. Our experimental results show that our approach can achieve an F-score of 92.8% in PKU closed test of the second SIGHAN bakeoff.