An intelligent full-text Chinese-English translation system
Information Sciences—Applications: An International Journal
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Chinese word segmentation based on language situation in processing ambiguous words
Information Sciences: an International Journal
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Large Margin Methods for Structured and Interdependent Output Variables
The Journal of Machine Learning Research
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Accelerated training of conditional random fields with stochastic gradient methods
ICML '06 Proceedings of the 23rd international conference on Machine learning
Training conditional random fields with multivariate evaluation measures
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using data mining techniques and rough set theory for language modeling
ACM Transactions on Asian Language Information Processing (TALIP)
Semantic passage segmentation based on sentence topics for question answering
Information Sciences: an International Journal
Practical use of non-local features for statistical spoken language understanding
Computer Speech and Language
Extractive spoken document summarization for information retrieval
Pattern Recognition Letters
Chinese word segmentation as morpheme-based lexical chunking
Information Sciences: an International Journal
Subword-based tagging by conditional random fields for Chinese word segmentation
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Extraction of transliteration pairs from parallel corpora using a statistical transliteration model
Information Sciences: an International Journal
Softmax-margin CRFs: training log-linear models with cost functions
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Integrating unsupervised and supervised word segmentation: The role of goodness measures
Information Sciences: an International Journal
A character-based joint model for Chinese word segmentation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Labelwise margin maximization for sequence labeling
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.07 |
This paper proposes a new criterion called minimum tag error (MTE) for discriminative training of conditional random fields (CRFs). The new criterion, which is a smoothed approximation to the sentence labeling error, aims to maximize an average of transcription tagging accuracies of all possible sentences, weighted by their probabilities. Corpora from the second international Chinese word segmentation bakeoff (Bakeoff 2005) are used to test the effectiveness of this new training criterion. The experimental results have demonstrated that the proposed minimum tag error criterion can reliably improve the initial performance of supervised conditional random fields. In particular, the recall rate of out-of-vocabulary words (R"o"o"v) is significantly improved compared with that obtained using standard conditional random fields. Furthermore, the new training method has the advantage of robustness to segmentation across all datasets.