Oracle decoding as a new way to analyze phrase-based machine translation

  • Authors:
  • Guillaume Wisniewski;François Yvon

  • Affiliations:
  • LIMSI--Université Paris Sud, Orsay, France;LIMSI--Université Paris Sud, Orsay, France

  • Venue:
  • Machine Translation
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extant Statistical Machine Translation systems are very complex pieces of software, which embed multiple layers of heuristics and encompass very large numbers of numerical parameters. As a result, it is difficult to analyze output translations and there is a real need for tools that could help developers to better understand the various causes of errors. In this study, we make a step in that direction and present an attempt to evaluate the quality of the phrase-based translation model. In order to identify those translation errors that stem from deficiencies in the phrase table, we propose to compute the oracle BLEU-4 score, that is the best score that a system based on this phrase table can achieve on a reference corpus. By casting the computation of the oracle BLEU-1 as an Integer Linear Programming problem, we show that it is possible to efficiently compute accurate upper-bounds of this score, and report measures performed on several standard benchmarks. Various other applications of these oracle decoding techniques are also reported and discussed.