Minimum tag error for discriminative training of conditional random fields

Authors:
Ying Xiong;Jie Zhu;Hao Huang;Haihua Xu
Affiliations:
Department of Electronic Engineering, Shanghai Jiao Tong University, 1164# Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Min Hang, Shanghai 200240, China;Department of Electronic Engineering, Shanghai Jiao Tong University, 1164# Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Min Hang, Shanghai 200240, China;Department of Electronic Engineering, Shanghai Jiao Tong University, 1164# Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Min Hang, Shanghai 200240, China;Department of Electronic Engineering, Shanghai Jiao Tong University, 1164# Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Min Hang, Shanghai 200240, China
Venue:
Information Sciences: an International Journal
Year:
2009

Citing 16
Cited 6

An intelligent full-text Chinese-English translation system

Information Sciences—Applications: An International Journal
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Chinese word segmentation based on language situation in processing ambiguous words

Information Sciences: an International Journal
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Accelerated training of conditional random fields with stochastic gradient methods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Training conditional random fields with multivariate evaluation measures

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using data mining techniques and rough set theory for language modeling

ACM Transactions on Asian Language Information Processing (TALIP)
Semantic passage segmentation based on sentence topics for question answering

Information Sciences: an International Journal
Practical use of non-local features for statistical spoken language understanding

Computer Speech and Language
Extractive spoken document summarization for information retrieval

Pattern Recognition Letters
Chinese word segmentation as morpheme-based lexical chunking

Information Sciences: an International Journal
Subword-based tagging by conditional random fields for Chinese word segmentation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Extraction of transliteration pairs from parallel corpora using a statistical transliteration model

Information Sciences: an International Journal

Softmax-margin CRFs: training log-linear models with cost functions

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Integrating unsupervised and supervised word segmentation: The role of goodness measures

Information Sciences: an International Journal
A character-based joint model for Chinese word segmentation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Labelwise margin maximization for sequence labeling

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Minimum-risk training for semi-Markov conditional random fields with application to handwritten Chinese/Japanese text recognition

Pattern Recognition

Quantified Score

Hi-index	0.07

Visualization

Abstract

This paper proposes a new criterion called minimum tag error (MTE) for discriminative training of conditional random fields (CRFs). The new criterion, which is a smoothed approximation to the sentence labeling error, aims to maximize an average of transcription tagging accuracies of all possible sentences, weighted by their probabilities. Corpora from the second international Chinese word segmentation bakeoff (Bakeoff 2005) are used to test the effectiveness of this new training criterion. The experimental results have demonstrated that the proposed minimum tag error criterion can reliably improve the initial performance of supervised conditional random fields. In particular, the recall rate of out-of-vocabulary words (R"o"o"v) is significantly improved compared with that obtained using standard conditional random fields. Furthermore, the new training method has the advantage of robustness to segmentation across all datasets.