Efficient extraction of oracle-best translations from hypergraphs

  • Authors:
  • Zhifei Li;Sanjeev Khudanpur

  • Affiliations:
  • The Johns Hopkins University, Baltimore, MD;The Johns Hopkins University, Baltimore, MD

  • Venue:
  • NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hypergraphs are used in several syntax-inspired methods of machine translation to compactly encode exponentially many translation hypotheses. The hypotheses closest to given reference translations therefore cannot be found via brute force, particularly for popular measures of closeness such as BLEU. We develop a dynamic program for extracting the so called oracle-best hypothesis from a hypergraph by viewing it as the problem of finding the most likely hypothesis under an n-gram language model trained from only the reference translations. We further identify and remove massive redundancies in the dynamic program state due to the sparsity of n-grams present in the reference translations, resulting in a very efficient program. We present runtime statistics for this program, and demonstrate successful application of the hypotheses thus found as the targets for discriminative training of translation system components.