BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Semi-supervised training for the averaged perceptron POS tagger
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Syntax-oriented evaluation measures for machine translation output
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
A hybrid morpheme-word representation for machine translation of morphologically rich languages
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Findings of the 2011 Workshop on Statistical Machine Translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Morpheme- and POS-based IBM1 scores and language model scores for translation quality estimation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Quality estimation for machine translation: some lessons learned
Machine Translation
Hi-index | 0.00 |
We propose the use of morphemes for automatic evaluation of machine translation output, and systematically investigate a set of F score and bleu score based metrics calculated on words, morphemes and pos tags along with all corresponding combinations. Correlations between the new metrics and human judgments are calculated on the data of the third, fourth and fifth shared tasks of the Statistical Machine Translation Workshop. Machine translation outputs in five different European languages are used: English, Spanish, French, German and Czech. The results show that the F scores which take into account morphemes and POS tags are the most promising metrics.