Linguistic features for automatic evaluation of heterogenous MT systems

  • Authors:
  • Jesús Giménez;Lluís Màrquez

  • Affiliations:
  • Universitat Politècnica de Catalunya, Barcelona;Universitat Politècnica de Catalunya, Barcelona

  • Venue:
  • StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Evaluation results recently reported by Callison-Burch et al. (2006) and Koehn and Monz (2006), revealed that, in certain cases, the BLEU metric may not be a reliable MT quality indicator. This happens, for instance, when the systems under evaluation are based on different paradigms, and therefore, do not share the same lexicon. The reason is that, while MT quality aspects are diverse, BLEU limits its scope to the lexical dimension. In this work, we suggest using metrics which take into account linguistic features at more abstract levels. We provide experimental results showing that metrics based on deeper linguistic information (syntactic/shallow-semantic) are able to produce more reliable system rankings than metrics based on lexical matching alone, specially when the systems under evaluation are of a different nature.