Toward a unified approach to statistical language modeling for Chinese
ACM Transactions on Asian Language Information Processing (TALIP)
A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Classifier combination for improved lexical disambiguation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
A Unified Character-Based Tagging Framework for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
An example-based study on chinese word segmentation using critical fragments
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
ACM Transactions on Asian Language Information Processing (TALIP)
Revising word lattice using support vector machine for Chinese word segmentation
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
The application of kalman filter based human-computer learning model to chinese word segmentation
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
This paper proposes an unsupervised training approach to resolving overlapping ambiguities in Chinese word segmentation. We present an ensemble of adapted Naïve Bayesian classifiers that can be trained using an unlabelled Chinese text corpus. These classifiers differ in that they use context words within windows of different sizes as features. The performance of our approach is evaluated on a manually annotated test set. Experimental results show that the proposed approach achieves an accuracy of 94.3%, rivaling the rule-based and supervised training methods.