Domain adaptation for statistical machine translation with monolingual resources

Authors:
Nicola Bertoldi;Marcello Federico
Affiliations:
FBK-irst -- Ricerca Scientifica e Tecnologica, Povo (TN), Italy;FBK-irst -- Ricerca Scientifica e Tecnologica, Povo (TN), Italy
Venue:
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Year:
2009

Citing 11
Cited 25

A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Language model adaptation for statistical machine translation with structured query models

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Semi-supervised model adaptation for statistical machine translation

Machine Translation
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Lattice-based minimum error rate training for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mixture-model adaptation for SMT

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Domain adaptation in statistical machine translation with mixture modelling

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in domain adaptation for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Dynamic model interpolation for statistical machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation

An Intelligent Agent That Autonomously Learns How to Translate

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Stream-based translation models for statistical machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The RALI machine translation system for WMT 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Context adaptation in statistical machine translation using models with exponentially decaying cache

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Discriminative instance weighting for domain adaptation in statistical machine translation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Local lexical adaptation in machine translation through triangulation: SMT helping SMT

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Bayesian adaptation for statistical machine translation

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Log-linear weight optimisation via Bayesian adaptation in statistical machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Domain adaptation for machine translation by mining unseen words

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Experiments with artificially generated noise for cleansing noisy text

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
A statistical medical summary translation system

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Investigations on translation model adaptation using monolingual data

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Improving translation model by monolingual data

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Domain adaptation techniques for machine translation and their evaluation in a real-world setting

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Structural and topical dimensions in multi-task patent translation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Encouraging consistent translation choices

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Translation model adaptation for statistical machine translation with monolingual topic information

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A topic similarity model for hierarchical phrase-based translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Mixing multiple translation models in statistical machine translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Towards effective use of training data in statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Selecting data for English-to-Czech machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Twitter translation using translation-based cross-lingual retrieval

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Unsupervised feature adaptation for cross-domain NLP with an application to compositionality grading

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
An intelligent Web agent that autonomously learns how to translate

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Domain adaptation has recently gained interest in statistical machine translation to cope with the performance drop observed when testing conditions deviate from training conditions. The basic idea is that in-domain training data can be exploited to adapt all components of an already developed system. Previous work showed small performance gains by adapting from limited in-domain bilingual data. Here, we aim instead at significant performance gains by exploiting large but cheap monolingual in-domain data, either in the source or in the target language. We propose to synthesize a bilingual corpus by translating the monolingual adaptation data into the counterpart language. Investigations were conducted on a state-of-the-art phrase-based system trained on the Spanish--English part of the UN corpus, and adapted on the corresponding Europarl data. Translation, re-ordering, and language models were estimated after translating in-domain texts with the baseline. By optimizing the interpolation of these models on a development set the BLEU score was improved from 22.60% to 28.10% on a test set.