Metric and reference factors in minimum error rate training

Authors:
Yifan He;Andy Way
Affiliations:
CNGL, School of Computing, Dublin City University, Dublin, Ireland;CNGL, School of Computing, Dublin City University, Dublin, Ireland
Venue:
Machine Translation
Year:
2010

Citing 10
Cited 1

Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Random restarts in minimum error rate training for statistical machine translation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Decomposability of translation metrics for improved evaluation and efficient algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice-based minimum error rate training for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Labelled dependencies in machine translation evaluation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Regularization and search for minimum error rate training

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The University of Maryland statistical machine translation system for the Fourth Workshop on Machine Translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation

David Bellos (ed): Is that a fish in your ear: translation and the meaning of everything

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Minimum Error Rate Training (MERT), Bleu is often used as the error function, despite the fact that it has been shown to have a lower correlation with human judgment than other metrics such as Meteor and Ter. In this paper, we present empirical results in which parameters tuned on Bleu may lead to sub-optimal Bleu scores under certain data conditions. Such scores can be improved significantly by tuning on an entirely different metric altogether, e.g. Meteor, by 0.0082 Bleu or 3.38% relative improvement on the WMT08 English---French data. We analyze the influence of the number of references and choice of metrics on the result of MERT and experiment on different data sets. We show the problems of tuning on a metric that is not designed for the single reference scenario and point out some possible solutions.