Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Chinese word segmentation as LMR tagging
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Chinese word segmentation and statistical machine translation
ACM Transactions on Speech and Language Processing (TSLP)
Minimum tag error for discriminative training of conditional random fields
Information Sciences: an International Journal
A Simple and Efficient Model Pruning Method for Conditional Random Fields
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Scaling conditional random fields by one-against-the-other decomposition
Journal of Computer Science and Technology
NICT-ATR speech-to-speech translation system
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Training Global Linear Models for Chinese Word Segmentation
Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
A discriminative latent variable chinese segmenter with hybrid word/character information
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved statistical machine translation by multiple Chinese word segmentation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
A Unified Character-Based Tagging Framework for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
Word-based and character-based word segmentation models: comparison and combination
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Syntactic processing using the generalized perceptron and beam search
Computational Linguistics
A stacked sub-word model for joint Chinese word segmentation and part-of-speech tagging
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
We proposed two approaches to improve Chinese word segmentation: a subword-based tagging and a confidence measure approach. We found the former achieved better performance than the existing character-based tagging, and the latter improved segmentation further by combining the former with a dictionary-based segmentation. In addition, the latter can be used to balance out-of-vocabulary rates and in-vocabulary rates. By these techniques we achieved higher F-scores in CITYU, PKU and MSR corpora than the best results from Sighan Bakeoff 2005.