Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improved source-channel models for Chinese word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
HHMM-based Chinese lexical analyzer ICTCLAS
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Subword-based tagging for confidence-dependent Chinese word segmentation
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Minimum tag error for discriminative training of conditional random fields
Information Sciences: an International Journal
Competitive generative models with structure learning for NLP classification tasks
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Automatic evaluation of Chinese translation output: word-level or character-level?
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
What is the basic semantic unit of Chinese language? a computational approach based on topic models
MOL'11 Proceedings of the 12th biennial conference on The mathematics of language
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
Iterative annotation transformation with predict-self reestimation for Chinese word segmentation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.02 |
The character-based tagging approach is a dominant technique for Chinese word segmentation, and both discriminative and generative models can be adopted in that framework. However, generative and discriminative character-based approaches are significantly different and complement each other. A simple joint model combining the character-based generative model and the discriminative one is thus proposed in this paper to take advantage of both approaches. Experiments on the Second SIGHAN Bakeoff show that this joint approach achieves 21% relative error reduction over the discriminative model and 14% over the generative one. In addition, closed tests also show that the proposed joint model outperforms all the existing approaches reported in the literature and achieves the best F-score in four out of five corpora.