Decomposability of translation metrics for improved evaluation and efficient algorithms

Authors:
David Chiang;Steve DeNeefe;Yee Seng Chan;Hwee Tou Ng
Affiliations:
University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;National University of Singapore, Law Link, Singapore;National University of Singapore, Law Link, Singapore
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 18
Cited 18

Order-n correction for regular languages

Communications of the ACM
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A discriminative global training algorithm for statistical MT

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
An end-to-end discriminative approach to machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Scalable inference and training of context-rich syntactic translation models

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Re-evaluating machine translation results with paraphrase support

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Lattice Minimum Bayes-Risk decoding for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Complexity of finding the BLEU-optimal hypothesis in a confusion network

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Comparing reordering constraints for SMT using efficient Bleu oracle computation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Linguistic features for automatic evaluation of heterogenous MT systems

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
An empirical study in source word deletion for phrase-based statistical machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation

A comparison of merging strategies for translation of German compounds

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Statistical post editing and dictionary extraction: Systran/Edinburgh submissions for ACL-WMT2009

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Feasibility of human-in-the-loop minimum error rate training

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
The NIST 2008 Metrics for machine translation challenge--overview, methodology, metrics, and results

Machine Translation
Metric and reference factors in minimum error rate training

Machine Translation
A unified approach to minimum risk training and decoding

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Assessing phrase-based translation models with oracle decoding

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
AMBER: a modified BLEU, enhanced ranking metric

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Regression and ranking based optimisation for sentence level machine translation evaluation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Tuning as ranking

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hope and fear for discriminative training of statistical translation models

The Journal of Machine Learning Research
Optimized online rank learning for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Maximum expected BLEU training of phrase and lexicon translation models

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
PORT: a precision-order-recall MT evaluation metric for tuning

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Enhancing statistical machine translation with character alignment

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Bagging and Boosting statistical machine translation systems

Artificial Intelligence
Oracle decoding as a new way to analyze phrase-based machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bleu is the de facto standard for evaluation and development of statistical machine translation systems. We describe three real-world situations involving comparisons between different versions of the same systems where one can obtain improvements in Bleu scores that are questionable or even absurd. These situations arise because Bleu lacks the property of decomposability, a property which is also computationally convenient for various applications. We propose a very conservative modification to Bleu and a cross between Bleu and word error rate that address these issues while improving correlation with human judgments.