Language model adaptation for statistical machine translation with structured query models
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Exploiting N-best hypotheses for SMT self-enhancement
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Language and translation model adaptation using comparable corpora
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mixture-model adaptation for SMT
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Domain adaptation in statistical machine translation with mixture modelling
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in domain adaptation for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Discriminative corpus weight estimation for machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Journal of Computational and Applied Mathematics
Analysing the effect of out-of-domain data on SMT systems
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Hi-index | 0.00 |
The translation model of statistical machine translation systems is trained on parallel data coming from various sources and domains. These corpora are usually concatenated, word alignments are calculated and phrases are extracted. This means that the corpora are not weighted according to their importance to the domain of the translation task. This is in contrast to the training of the language model for which well known techniques are used to weight the various sources of texts. On a smaller granularity, the automatic calculated word alignments differ in quality. This is usually not considered when extracting phrases either. In this paper we propose a method to automatically weight the different corpora and alignments. This is achieved with a resampling technique. We report experimental results for a small (IWSLT) and large (NIST) Arabic/English translation tasks. In both cases, significant improvements in the BLEU score were observed.