Statistical methods for speech recognition
Statistical methods for speech recognition
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Chunking with support vector machines
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
The first international Chinese word segmentation Bakeoff
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation as LMR tagging
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
HHMM-based Chinese lexical analyzer ICTCLAS
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Adaptive Chinese word segmentation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Chinese and Japanese word segmentation using word-level and character-level information
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Subword-based tagging for confidence-dependent Chinese word segmentation
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Subword-based tagging for confidence-dependent Chinese word segmentation
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Chinese word segmentation as morpheme-based lexical chunking
Information Sciences: an International Journal
TBL-improved non-deterministic segmentation and POS tagging for a Chinese parser
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A character-based joint model for Chinese word segmentation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Joint Chinese word segmentation, POS tagging and parsing
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
We proposed a subword-based tagging for Chinese word segmentation to improve the existing character-based tagging. The subword-based tagging was implemented using the maximum entropy (MaxEnt) and the conditional random fields (CRF) methods. We found that the proposed subword-based tagging outperformed the character-based tagging in all comparative experiments. In addition, we proposed a confidence measure approach to combine the results of a dictionary-based and a subword-tagging-based segmentation. This approach can produce an ideal tradeoff between the in-vocaulary rate and out-of-vocabulary rate. Our techniques were evaluated using the test data from Sighan Bakeoff 2005. We achieved higher F-scores than the best results in three of the four corpora: PKU(0.951), CITYU(0.950) and MSR(0.971).