A grain of salt for the WMT manual evaluation

  • Authors:
  • Ondřej Bojar;Miloš Ercegovčević;Martin Popel;Omar F. Zaidan

  • Affiliations:
  • Charles University in Prague, Institute of Formal and Applied Linguistics;Charles University in Prague, Institute of Formal and Applied Linguistics;Charles University in Prague, Institute of Formal and Applied Linguistics;Johns Hopkins University

  • Venue:
  • WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Workshop on Statistical Machine Translation (WMT) has become one of ACL's flagship workshops, held annually since 2006. In addition to soliciting papers from the research community, WMT also features a shared translation task for evaluating MT systems. This shared task is notable for having manual evaluation as its cornerstone. The Workshop's overview paper, playing a descriptive and administrative role, reports the main results of the evaluation without delving deep into analyzing those results. The aim of this paper is to investigate and explain some interesting idiosyncrasies in the reported results, which only become apparent when performing a more thorough analysis of the collected annotations. Our analysis sheds some light on how the reported results should (and should not) be interpreted, and also gives rise to some helpful recommendation for the organizers of WMT.