Translation model adaptation by resampling

  • Authors:
  • Kashif Shah;Loïc Barrault;Holger Schwenk

  • Affiliations:
  • University of Le Mans, Le Mans, France;University of Le Mans, Le Mans, France;University of Le Mans, Le Mans, France

  • Venue:
  • WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The translation model of statistical machine translation systems is trained on parallel data coming from various sources and domains. These corpora are usually concatenated, word alignments are calculated and phrases are extracted. This means that the corpora are not weighted according to their importance to the domain of the translation task. This is in contrast to the training of the language model for which well known techniques are used to weight the various sources of texts. On a smaller granularity, the automatic calculated word alignments differ in quality. This is usually not considered when extracting phrases either. In this paper we propose a method to automatically weight the different corpora and alignments. This is achieved with a resampling technique. We report experimental results for a small (IWSLT) and large (NIST) Arabic/English translation tasks. In both cases, significant improvements in the BLEU score were observed.