Learning probabilistic synchronous CFGs for phrase-based translation

  • Authors:
  • Markos Mylonakis;Khalil Sima'an

  • Affiliations:
  • University of Amsterdam;University of Amsterdam

  • Venue:
  • CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Probabilistic phrase-based synchronous grammars are now considered promising devices for statistical machine translation because they can express reordering phenomena between pairs of languages. Learning these hierarchical, probabilistic devices from parallel corpora constitutes a major challenge, because of multiple latent model variables as well as the risk of data overfitting. This paper presents an effective method for learning a family of particular interest to MT, binary Synchronous Context-Free Grammars with inverted/monotone orientation (a.k.a. Binary ITG). A second contribution concerns devising a lexicalized phrase reordering mechanism that has complimentary strengths to Chiang's model. The latter conditions reordering decisions on the surrounding lexical context of phrases, whereas our mechanism works with the lexical content of phrase pairs (akin to standard phrase-based systems). Surprisingly, our experiments on French-English data show that our learning method applied to far simpler models exhibits performance indistinguishable from the Hiero system.