Improving translation model by monolingual data

  • Authors:
  • Ondřej Bojar;Aleš Tamchyna

  • Affiliations:
  • Charles University in Prague;Charles University in Prague

  • Venue:
  • WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We use target-side monolingual data to extend the vocabulary of the translation model in statistical machine translation. This method called "reverse self-training" improves the decoder's ability to produce grammatically correct translations into languages with morphology richer than the source language esp. in small-data setting. We empirically evaluate the gains for several pairs of European languages and discuss some approaches of the underlying back-off techniques needed to translate unseen forms of known words. We also provide a description of the systems we submitted to WMT11 Shared Task.