Better hypothesis testing for statistical machine translation: controlling for optimizer instability

Authors:
Jonathan H. Clark;Chris Dyer;Alon Lavie;Noah A. Smith
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 19
Cited 28

The statistical significance of the MUC-5 results

MUC5 '93 Proceedings of the 5th conference on Message understanding
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Hierarchical Phrase-Based Translation

Computational Linguistics
Wide-coverage deep statistical parsing using automatic dependency structure annotation

Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Random restarts in minimum error rate training for statistical machine translation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Online large-margin training of syntactic and structural translation features

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using a maximum entropy model to build segmentation lattices for MT

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Regularization and search for minimum error rate training

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Stabilizing minimum error rate training

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Efficient Minimum Error Rate Training and Minimum Bayes-Risk decoding for translation hypergraphs and lattices

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Significance tests of automatic machine translation evaluation metrics

Machine Translation
Extending the meteor machine translation evaluation metric to the phrase level

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The best lexical metric for phrase-based statistical MT system optimization

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
cdec: a decoder, alignment, and learning framework for finite-state and context-free translation models

ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Monte Carlo techniques for phrase-based translation

Machine Translation
All of Statistics: A Concise Course in Statistical Inference

All of Statistics: A Concise Course in Statistical Inference

Unsupervised word alignment with arbitrary features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Fuzzy syntactic reordering for phrase-based statistical machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Cutting the long tail: hybrid language models for translation style adaptation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Perplexity minimization for translation model domain adaptation in statistical machine translation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Structural and topical dimensions in multi-task patent translation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Structured ramp loss minimization for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Optimized online rank learning for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Batch tuning strategies for statistical machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Prediction of learning curves in machine translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A class-based agreement model for generating accurately inflected translations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Modified distortion matrices for phrase-based statistical machine translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Mixing multiple translation models in statistical machine translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Inducing a discriminative parser to optimize machine translation reordering

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Transforming trees to improve syntactic convergence

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
CCG syntactic reordering models for phrase-based machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Probes in a taxonomy of factored phrase-based models

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Selecting data for English-to-Czech machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
GHKM rule extraction and scope-3 parsing in Moses

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Constructing parallel corpora for six Indian languages via crowdsourcing

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Analysing the effect of out-of-domain data on SMT systems

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
What types of word alignment improve statistical machine translation?

Machine Translation
Bagging and Boosting statistical machine translation systems

Artificial Intelligence
No free lunch in factored phrase-based machine translation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Generalizing sampling-based multilingual alignment

Machine Translation
Modeling lexical cohesion for document-level machine translation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Improving statistical machine translation by adapting translation models to translationese

Computational Linguistics
Generation of compound words in statistical machine translation into compounding languages

Computational Linguistics
Improving statistical machine translation by adapting translation models to translationese

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the two systems on held-out data. In this paper, we consider how to make such experiments more statistically reliable. We provide a systematic analysis of the effects of optimizer instability---an extraneous variable that is seldom controlled for---on experimental outcomes, and make recommendations for reporting results more accurately.