Evaluation without references: IBM1 scores as evaluation metrics

  • Authors:
  • Maja Popović;David Vilar;Eleftherios Avramidis;Aljoscha Burchardt

  • Affiliations:
  • German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany;German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany;German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany;German Research Center for Artificial Intelligence (DFKI), Language Technology (LT), Berlin, Germany

  • Venue:
  • WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current metrics for evaluating machine translation quality have the huge drawback that they require human-quality reference translations. We propose a truly automatic evaluation metric based on ibm1 lexicon probabilities which does not need any reference translations. Several variants of ibm1 scores are systematically explored in order to find the most promising directions. Correlations between the new metrics and human judgments are calculated on the data of the third, fourth and fifth shared tasks of the Statistical Machine Translation Workshop. Five different European languages are taken into account: English, Spanish, French, German and Czech. The results show that the ibm1 scores are competitive with the classic evaluation metrics, the most promising being ibm1 scores calculated on morphemes and pos-4grams.