Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora

Authors:
Hua Wu;Haifeng Wang;Chengqing Zong
Affiliations:
Toshiba (China) R&D Center, Beijing, China;Toshiba (China) R&D Center, Beijing, China;Chinese Academy of Sciences, Beijing, China
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 10
Cited 14

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Mixture-model adaptation for SMT

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Domain adaptation in statistical machine translation with mixture modelling

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in domain adaptation for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Manual and automatic evaluation of machine translation between European languages

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation

Improving statistical machine translation using domain bilingual multiword expressions

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Stream-based randomised language models for SMT

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Combining statistical and semantic approaches to the translation of ontologies and taxonomies

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Multiple-stream language models for statistical machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Domain adaptation techniques for machine translation and their evaluation in a real-world setting

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Alexander Clark, Chris Fox and Shalom Lappin (eds): Handbook of computational linguistics and natural language processing

Machine Translation
Bootstrapping method for chunk alignment in phrase based SMT

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Translation model adaptation for statistical machine translation with monolingual topic information

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A topic similarity model for hierarchical phrase-based translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Large scale decipherment for out-of-domain machine translation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Using domain-specific and collaborative resources for term translation

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Selecting data for English-to-Czech machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text. In this paper, we propose a method to perform domain adaptation for statistical machine translation, where in-domain bilingual corpora do not exist. This method first uses out-of-domain corpora to train a baseline system and then uses in-domain translation dictionaries and in-domain monolingual corpora to improve the in-domain performance. We propose an algorithm to combine these different resources in a unified framework. Experimental results indicate that our method achieves absolute improvements of 8.16 and 3.36 BLEU scores on Chinese to English translation and English to French translation respectively, as compared with the baselines using only out-of-domain corpora.