METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output

  • Authors:
  • Abhaya Agarwal;Alon Lavie

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes our submissions to the machine translation evaluation shared task in ACL WMT-08. Our primary submission is the Meteor metric tuned for optimizing correlation with human rankings of translation hypotheses. We show significant improvement in correlation as compared to the earlier version of metric which was tuned to optimized correlation with traditional adequacy and fluency judgments. We also describe m-bleu and m-ter, enhanced versions of two other widely used metrics bleu and ter respectively, which extend the exact word matching used in these metrics with the flexible matching based on stemming and Wordnet in Meteor.