The Meteor metric for automatic evaluation of machine translation

  • Authors:
  • Alon Lavie;Michael J. Denkowski

  • Affiliations:
  • Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA 15213;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA 15213

  • Venue:
  • Machine Translation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Meteor Automatic Metric for Machine Translation evaluation, originally developed and released in 2004, was designed with the explicit goal of producing sentence-level scores which correlate well with human judgments of translation quality. Several key design decisions were incorporated into Meteor in support of this goal. In contrast with IBM's Bleu, which uses only precision-based features, Meteor uses and emphasizes recall in addition to precision, a property that has been confirmed by several metrics as being critical for high correlation with human judgments. Meteor also addresses the problem of reference translation variability by utilizing flexible word matching, allowing for morphological variants and synonyms to be taken into account as legitimate correspondences. Furthermore, the feature ingredients within Meteor are parameterized, allowing for the tuning of the metric's free parameters in search of values that result in optimal correlation with human judgments. Optimal parameters can be separately tuned for different types of human judgments and for different languages. We discuss the initial design of the Meteor metric, subsequent improvements, and performance in several independent evaluations in recent years.