The nature of statistical learning theory
The nature of statistical learning theory
Making large-scale support vector machine learning practical
Advances in kernel methods
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Use of support vector learning for chunk identification
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Statistically-enhanced new word identification in a rule-based Chinese system
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Combining classifiers for Chinese word segmentation
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Unsupervised training for overlapping ambiguity resolution in Chinese word segmentation
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese lexical analysis using hierarchical hidden Markov model
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
The first international Chinese word segmentation Bakeoff
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation using minimal linguistic knowledge
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation in MSR-NLP
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
The use of SVM for chinese new word identification
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Hi-index | 0.00 |
This paper presents a novel Chinese word segmentation approach combining both dictionary-based and statistics-based techniques. First, we transform a linear sentence to a word lattice based on dictionary. Then we apply classification method based on support vector machine to conduct two main tasks: resolving segmentation ambiguities and recognizing out-of-vocabulary words. We determine the position in word of the current character by using some of its surrounding characters as features. Disambiguation and recognition result in pruning and appending edges in the word lattice. Lastly, we output the segmentation results by searching the shortest path in the word lattice. Our experimental results show that our approach can achieve an F-score of 92.8% in PKU closed test of the second SIGHAN bakeoff.