Syntax-oriented evaluation measures for machine translation output

  • Authors:
  • Maja Popović;Hermann Ney

  • Affiliations:
  • RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany

  • Venue:
  • StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We explored novel automatic evaluation measures for machine translation output oriented to the syntactic structure of the sentence: the Bleu score on the detailed Part-of-Speech (pos) tags as well as the precision, recall and F-measure obtained on pos n-grams. We also introduced F-measure based on both word and pos n-grams. Correlations between the new metrics and human judgments were calculated on the data of the first, second and third shared task of the Statistical Machine Translation Workshop. Machine translation outputs in four different European languages were taken into account: English, Spanish, French and German. The results show that the new measures correlate very well with the human judgements and that they are competitive with the widely used BLEU, METEOR and TER metrics.