Class error rates for evaluation of machine translation output

Authors:
Maja Popović
Affiliations:
German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany
Venue:
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Year:
2012

Citing 6
Cited 1

The best of two worlds: cooperation of statistical and rule-based taggers for Czech

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Towards automatic error analysis of machine translation output

Computational Linguistics
Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation

Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the use of error classification results for automatic evaluation of machine translation output. Five basic error classes are taken into account: morphological errors, syntactic (reordering) errors, missing words, extra words and lexical errors. In addition, linear combinations of these categories are investigated. Correlations between the class error rates and human judgments are calculated on the data of the third, fourth, fifth and sixth shared tasks of the Statistical Machine Translation Workshop. Machine translation outputs in five different European languages are used: English, Spanish, French, German and Czech. The results show that the following combinations are the most promising: the sum of all class error rates, the weighted sum optimised for translation into English and the weighted sum optimised for translation from English.