Why generative phrase models underperform surface heuristics

  • Authors:
  • John DeNero;Dan Gillick;James Zhang;Dan Klein

  • Affiliations:
  • University of California, Berkeley, CA;University of California, Berkeley, CA;University of California, Berkeley, CA;University of California, Berkeley, CA

  • Venue:
  • StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate why weights from generative models underperform heuristic estimates in phrase-based machine translation. We first propose a simple generative, phrase-based model and verify that its estimates are inferior to those given by surface statistics. The performance gap stems primarily from the addition of a hidden segmentation variable, which increases the capacity for overfitting during maximum likelihood training with EM. In particular, while word level models benefit greatly from re-estimation, phrase-level models do not: the crucial difference is that distinct word alignments cannot all be correct, while distinct segmentations can. Alternate segmentations rather than alternate alignments compete, resulting in increased deter-minization of the phrase table, decreased generalization, and decreased final BLEU score. We also show that interpolation of the two methods can result in a modest increase in BLEU score.