Investigations on translation model adaptation using monolingual data

Authors:
Patrik Lambert;Holger Schwenk;Christophe Servan;Sadaf Abdul-Rauf
Affiliations:
LIUM, University of Le Mans, France;LIUM, University of Le Mans, France;LIUM, University of Le Mans, France;LIUM, University of Le Mans, France
Venue:
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Year:
2011

Citing 12
Cited 7

A systematic comparison of various statistical alignment models

Computational Linguistics
Language model adaptation for statistical machine translation with structured query models

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Four techniques for online handling of out-of-vocabulary words in Arabic-English statistical machine translation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Exploiting N-best hypotheses for SMT self-enhancement

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Parallel implementations of word alignment tool

SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Mixture-model adaptation for SMT

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Domain adaptation in statistical machine translation with mixture modelling

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in domain adaptation for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Domain adaptation for statistical machine translation with monolingual resources

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Training phrase translation models with leaving-one-out

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
LIUM SMT machine translation system for WMT 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

LIUM's SMT machine translation systems for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Toward statistical machine translation without parallel corpora

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Adapting translation models to translationese improves SMT

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
ONTS: "optima" news translation system

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Towards effective use of training data in statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
LIUM's SMT machine translation systems for WMT 2012

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Selecting data for English-to-Czech machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the freely available parallel data to train the translation model of a statistical machine translation system comes from very specific sources (European parliament, United Nations, etc). Therefore, there is increasing interest in methods to perform an adaptation of the translation model. A popular approach is based on unsupervised training, also called self-enhancing. Both only use monolingual data to adapt the translation model. In this paper we extend the previous work and provide new insight in the existing methods. We report results on the translation between French and English. Improvements of up to 0.5 BLEU were observed with respect to a very competitive baseline trained on more than 280M words of human translated parallel data.