Minimum risk annealing for training log-linear models

Authors:
David A. Smith;Jason Eisner
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Year:
2006

Citing 13
Cited 32

Robust trainability of single neurons

Journal of Computer and System Sciences
Deterministic annealing EM algorithm

Neural Networks
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Parsing algorithms and metrics

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Learning Hidden Variable Networks: The Information Bottleneck Approach

The Journal of Machine Learning Research
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Annealing techniques for unsupervised statistical language learning

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Logarithmic opinion pools for conditional random fields

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative training via linear programming

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Vine parsing and minimum risk reranking for speed and precision

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning

Beyond log-linear models: boosted minimum error rate training for N-best Re-ranking

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Vine parsing and minimum risk reranking for speed and precision

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Monte carlo inference and maximization for phrase-based translation

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Online large-margin training of syntactic and structural translation features

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice Minimum Bayes-Risk decoding for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice-based minimum error rate training for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Training non-parametric features for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Regularization and search for minimum error rate training

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Joshua: an open source toolkit for parsing-based machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Variational decoding for statistical machine translation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
First- and second-order expectation semirings with applications to minimum-risk training on translation forests

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Consensus training for consensus decoding in machine translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Softmax-margin CRFs: training log-linear models with cost functions

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The Cunei machine translation platform for WMT '10

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
A unified approach to minimum risk training and decoding

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Minimum error rate training by sampling the translation lattice

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Monte Carlo techniques for phrase-based translation

Machine Translation
The impact of language models and loss functions on repair disfluency detection

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Cunei: open-source machine translation with relevance-based models of each translation instance

Machine Translation
Expected BLEU training for graphs: BBN system description for WMT11 system combination task

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
SampleRank training for phrase-based machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Optimal search for minimum error rate training

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast generation of translation forest for large-scale SMT discriminative training

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Minimum imputed risk: unsupervised discriminative training for machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hope and fear for discriminative training of statistical translation models

The Journal of Machine Learning Research
Minimum-risk training of approximate CRF-based NLP systems

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Structured ramp loss minimization for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Optimized online rank learning for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Batch tuning strategies for statistical machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Maximum expected BLEU training of phrase and lexicon translation models

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Optimization strategies for online large-margin learning in machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Minimum-risk training for semi-Markov conditional random fields with application to handwritten Chinese/Japanese text recognition

Pattern Recognition

Quantified Score

Hi-index	0.02

Visualization

Abstract

When training the parameters for a natural language system, one would prefer to minimize 1-best loss (error) on an evaluation set. Since the error surface for many natural language problems is piecewise constant and riddled with local minima, many systems instead optimize log-likelihood, which is conveniently differentiable and convex. We propose training instead to minimize the expected loss, or risk. We define this expectation using a probability distribution over hypotheses that we gradually sharpen (anneal) to focus on the 1-best hypothesis. Besides the linear loss functions used in previous work, we also describe techniques for optimizing nonlinear functions such as precision or the BLEU metric. We present experiments training log-linear combinations of models for dependency parsing and for machine translation. In machine translation, annealed minimum risk training achieves significant improvements in BLEU over standard minimum error training. We also show improvements in labeled dependency parsing.