Softmax-margin CRFs: training log-linear models with cost functions

Authors:
Kevin Gimpel;Noah A. Smith
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 13
Cited 11

A maximum entropy approach to natural language processing

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An Alternate Objective Function for Markovian Fields

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Training conditional random fields with multivariate evaluation measures

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Maximum expected F-measure training of logistic regression models

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Minimum risk annealing for training log-linear models

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Minimum tag error for discriminative training of conditional random fields

Information Sciences: an International Journal
First- and second-order expectation semirings with applications to minimum-risk training on translation forests

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Turbo parsers: dependency parsing by approximate variational inference

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Entropy and margin maximization for structured output learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
An alternating direction method for dual MAP LP relaxation

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Improving context-aware query classification via adaptive self-training

Proceedings of the 20th ACM international conference on Information and knowledge management
Training a log-linear parser with loss functions via softmax-margin

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Recall-oriented learning of named entities in Arabic Wikipedia

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Minimum-risk training of approximate CRF-based NLP systems

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Structured ramp loss minimization for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Optimized online rank learning for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
CRF framework for supervised preference aggregation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Minimum-risk training for semi-Markov conditional random fields with application to handwritten Chinese/Japanese text recognition

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a method of incorporating task-specific cost functions into standard conditional log-likelihood (CLL) training of linear structured prediction models. Recently introduced in the speech recognition community, we describe the method generally for structured models, highlight connections to CLL and max-margin learning for structured prediction (Taskar et al., 2003), and show that the method optimizes a bound on risk. The approach is simple, efficient, and easy to implement, requiring very little change to an existing CLL implementation. We present experimental results comparing with several commonly-used methods for training structured predictors for named-entity recognition.