Morphemes and POS tags for n-gram based evaluation metrics

Authors:
Maja Popović
Affiliations:
German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany
Venue:
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Year:
2011

Citing 7
Cited 3

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Semi-supervised training for the averaged perceptron POS tagger

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Syntax-oriented evaluation measures for machine translation output

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
A hybrid morpheme-word representation for machine translation of morphologically rich languages

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Morpheme- and POS-based IBM1 scores and language model scores for translation quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Quality estimation for machine translation: some lessons learned

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose the use of morphemes for automatic evaluation of machine translation output, and systematically investigate a set of F score and bleu score based metrics calculated on words, morphemes and pos tags along with all corresponding combinations. Correlations between the new metrics and human judgments are calculated on the data of the third, fourth and fifth shared tasks of the Statistical Machine Translation Workshop. Machine translation outputs in five different European languages are used: English, Spanish, French, German and Czech. The results show that the F scores which take into account morphemes and POS tags are the most promising metrics.