Segment choice models: feature-rich models for global distortion in statistical machine translation

  • Authors:
  • Roland Kuhn;Denis Yuen;Michel Simard;Patrick Paul;George Foster;Eric Joanis;Howard Johnson

  • Affiliations:
  • National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada

  • Venue:
  • HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new approach to distortion (phrase reordering) in phrase-based machine translation (MT). Distortion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible phrase reordering. These "segment choice" models (SCMs) can be trained on "segment-aligned" sentence pairs; they can be applied during decoding or rescoring. The approach yields a metric called "distortion perplexity" ("disperp") for comparing SCMs offline on test data, analogous to perplexity for language models. A decision-tree-based SCM is tested on Chinese-to-English translation, and outperforms a baseline distortion penalty approach at the 99% confidence level.