Maximum expected BLEU training of phrase and lexicon translation models

Authors:
Xiaodong He;Li Deng
Affiliations:
Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA
Venue:
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Year:
2012

Citing 20
Cited 1

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Dependency treelet translation: syntactically informed phrasal SMT

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Going beyond AER: an extensive analysis of word alignments and their impact on MT

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
An end-to-end discriminative approach to machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Minimum risk annealing for training log-linear models

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Decomposability of translation metrics for improved evaluation and efficient algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice Minimum Bayes-Risk decoding for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice-based minimum error rate training for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
11,001 new features for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using word dependent transition models in HMM based word alignment for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Training phrase translation models with leaving-one-out

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Discriminative instance weighting for domain adaptation in statistical machine translation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Expected BLEU training for graphs: BBN system description for WMT11 system combination task

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Domain adaptation via pseudo in-domain data selection

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast generation of translation forest for large-scale SMT discriminative training

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An inequality for rational functions with applications to some statistical estimation problems

IEEE Transactions on Information Theory

Leave-one-out phrase model training for large-scale deployment

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 2011 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.