Translation model adaptation by resampling

Authors:
Kashif Shah;Loïc Barrault;Holger Schwenk
Affiliations:
University of Le Mans, Le Mans, France;University of Le Mans, Le Mans, France;University of Le Mans, Le Mans, France
Venue:
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Year:
2010

Citing 9
Cited 1

Language model adaptation for statistical machine translation with structured query models

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Exploiting N-best hypotheses for SMT self-enhancement

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Language and translation model adaptation using comparable corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mixture-model adaptation for SMT

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Domain adaptation in statistical machine translation with mixture modelling

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in domain adaptation for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Discriminative corpus weight estimation for machine translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
CONDOR, a new parallel, constrained extension of Powell's UOBYQA algorithm: Experimental results and comparison with the DFO algorithm

Journal of Computational and Applied Mathematics

Analysing the effect of out-of-domain data on SMT systems

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The translation model of statistical machine translation systems is trained on parallel data coming from various sources and domains. These corpora are usually concatenated, word alignments are calculated and phrases are extracted. This means that the corpora are not weighted according to their importance to the domain of the translation task. This is in contrast to the training of the language model for which well known techniques are used to weight the various sources of texts. On a smaller granularity, the automatic calculated word alignments differ in quality. This is usually not considered when extracting phrases either. In this paper we propose a method to automatically weight the different corpora and alignments. This is achieved with a resampling technique. We report experimental results for a small (IWSLT) and large (NIST) Arabic/English translation tasks. In both cases, significant improvements in the BLEU score were observed.