Evaluation without references: IBM1 scores as evaluation metrics

Authors:
Maja Popović;David Vilar;Eleftherios Avramidis;Aljoscha Burchardt
Affiliations:
German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany;German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany;German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany;German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany
Venue:
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Year:
2011

Citing 6
Cited 4

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Semi-supervised training for the averaged perceptron POS tagger

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Morpheme- and POS-based IBM1 scores and language model scores for translation quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
DFKI's SMT system for WMT 2012

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Quality estimation for machine translation: some lessons learned

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current metrics for evaluating machine translation quality have the huge drawback that they require human-quality reference translations. We propose a truly automatic evaluation metric based on ibm1 lexicon probabilities which does not need any reference translations. Several variants of ibm1 scores are systematically explored in order to find the most promising directions. Correlations between the new metrics and human judgments are calculated on the data of the third, fourth and fifth shared tasks of the Statistical Machine Translation Workshop. Five different European languages are taken into account: English, Spanish, French, German and Czech. The results show that the ibm1 scores are competitive with the classic evaluation metrics, the most promising being ibm1 scores calculated on morphemes and pos-4grams.