Efficient Minimum Error Rate Training and Minimum Bayes-Risk decoding for translation hypergraphs and lattices

Authors:
Shankar Kumar;Wolfgang Macherey;Chris Dyer;Franz Och
Affiliations:
Google Inc., Mountain View, CA;Google Inc., Mountain View, CA;University of Maryland, College Park, MD;Google Inc., Mountain View, CA
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 10
Cited 32

Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Scalable inference and training of context-rich syntactic translation models

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
High quality word graphs using forward-backward pruning

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Hierarchical Phrase-Based Translation

Computational Linguistics
Computational Geometry: Algorithms and Applications

Computational Geometry: Algorithms and Applications
Forest-based translation rule extraction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice Minimum Bayes-Risk decoding for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice-based minimum error rate training for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Syntax augmented machine translation via chart parsing

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation

Consensus training for consensus decoding in machine translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Context-free reordering, finite-state translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Expected sequence similarity maximization

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Model combination for machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Efficient path counting transducers for minimum bayes-risk decoding of statistical machine translation lattices

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
cdec: a decoder, alignment, and learning framework for finite-state and context-free translation models

ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
The University of Maryland statistical machine translation system for the Fifth Workshop on Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
A unified approach to minimum risk training and decoding

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Minimum error rate training by sampling the translation lattice

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Fluency constraints for minimum Bayes-risk decoding of statistical machine translation lattices

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Mixture model-based minimum Bayes risk decoding using multiple machine translation systems

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Machine translation with lattices and forests

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unsupervised word alignment with arbitrary features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Model-based aligner combination using dual decomposition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Binarized forest to string translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Machine translation system combination by confusion forest

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Hypothesis mixture decoding for statistical machine translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Minimum Bayes-risk system combination

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Better hypothesis testing for statistical machine translation: controlling for optimizer instability

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
A decoding method of system combination based on hypergraph in SMT

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III
The UPV-PRHLT combination system for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The CMU-ARK German-English translation system

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Optimal search for minimum error rate training

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Heuristic search for non-bottom-up tree structure prediction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Tuning as ranking

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Third-order variational reranking on packed-shared dependency forests

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Optimized online rank learning for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Joint feature selection in distributed stochastic learning for large-scale discriminative training in SMT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Learning translation consensus with structured label propagation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Direct error rate minimization for statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Optimization strategies for online large-margin learning in machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Bagging and Boosting statistical machine translation systems

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Minimum Error Rate Training (MERT) and Minimum Bayes-Risk (MBR) decoding are used in most current state-of-the-art Statistical Machine Translation (SMT) systems. The algorithms were originally developed to work with N-best lists of translations, and recently extended to lattices that encode many more hypotheses than typical N-best lists. We here extend lattice-based MERT and MBR algorithms to work with hypergraphs that encode a vast number of translations produced by MT systems based on Synchronous Context Free Grammars. These algorithms are more efficient than the lattice-based versions presented earlier. We show how MERT can be employed to optimize parameters for MBR decoding. Our experiments show speedups from MERT and MBR as well as performance improvements from MBR decoding on several language pairs.