Soft syntactic constraints for Arabic---English hierarchical phrase-based translation

  • Authors:
  • Yuval Marton;David Chiang;Philip Resnik

  • Affiliations:
  • IBM T.J. Watson Research Center, Yorktown Heights, USA 10598;USC Information Sciences Institute (ISI), Marina del Rey, USA 90292;Department of Linguistics and the Laboratory for Computational Linguistics and Information Processing (CLIP) at the Institute for Advanced Computer Studies (UMIACS), University of Maryland, Colleg ...

  • Venue:
  • Machine Translation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In adding syntax to statistical machine translation, there is a tradeoff between taking advantage of linguistic analysis and allowing the model to exploit parallel training data with no linguistic analysis: translation quality versus coverage. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment. We present an approach that explores the tradeoff from the other direction, starting with a translation model learned directly from aligned parallel text, and then adding soft constituent-level constraints based on parses of the source language. We argue that in order for these constraints to improve translation, they must be fine-grained: the constraints should vary by constituent type, and by the type of match or mismatch with the parse. We also use a different feature weight optimization technique, capable of handling large amount of features, thus eliminating the bottleneck of feature selection. We obtain substantial improvements in performance for translation from Arabic to English.