Robust trainability of single neurons
Journal of Computer and System Sciences
Deterministic annealing EM algorithm
Neural Networks
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Parsing algorithms and metrics
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Learning Hidden Variable Networks: The Information Bottleneck Approach
The Journal of Machine Learning Research
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Annealing techniques for unsupervised statistical language learning
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Logarithmic opinion pools for conditional random fields
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative training via linear programming
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Vine parsing and minimum risk reranking for speed and precision
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Beyond log-linear models: boosted minimum error rate training for N-best Re-ranking
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Vine parsing and minimum risk reranking for speed and precision
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Monte carlo inference and maximization for phrase-based translation
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Online large-margin training of syntactic and structural translation features
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice Minimum Bayes-Risk decoding for statistical machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice-based minimum error rate training for statistical machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Training non-parametric features for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Regularization and search for minimum error rate training
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Joshua: an open source toolkit for parsing-based machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Variational decoding for statistical machine translation
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Consensus training for consensus decoding in machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Softmax-margin CRFs: training log-linear models with cost functions
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The Cunei machine translation platform for WMT '10
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
A unified approach to minimum risk training and decoding
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Minimum error rate training by sampling the translation lattice
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Monte Carlo techniques for phrase-based translation
Machine Translation
The impact of language models and loss functions on repair disfluency detection
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Expected BLEU training for graphs: BBN system description for WMT11 system combination task
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
SampleRank training for phrase-based machine translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Optimal search for minimum error rate training
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast generation of translation forest for large-scale SMT discriminative training
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Minimum imputed risk: unsupervised discriminative training for machine translation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hope and fear for discriminative training of statistical translation models
The Journal of Machine Learning Research
Minimum-risk training of approximate CRF-based NLP systems
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Structured ramp loss minimization for machine translation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Optimized online rank learning for machine translation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Batch tuning strategies for statistical machine translation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Maximum expected BLEU training of phrase and lexicon translation models
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Optimization strategies for online large-margin learning in machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Hi-index | 0.02 |
When training the parameters for a natural language system, one would prefer to minimize 1-best loss (error) on an evaluation set. Since the error surface for many natural language problems is piecewise constant and riddled with local minima, many systems instead optimize log-likelihood, which is conveniently differentiable and convex. We propose training instead to minimize the expected loss, or risk. We define this expectation using a probability distribution over hypotheses that we gradually sharpen (anneal) to focus on the 1-best hypothesis. Besides the linear loss functions used in previous work, we also describe techniques for optimizing nonlinear functions such as precision or the BLEU metric. We present experiments training log-linear combinations of models for dependency parsing and for machine translation. In machine translation, annealed minimum risk training achieves significant improvements in BLEU over standard minimum error training. We also show improvements in labeled dependency parsing.