Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora

  • Authors:
  • Hua Wu;Haifeng Wang;Chengqing Zong

  • Affiliations:
  • Toshiba (China) R&D Center, Beijing, China;Toshiba (China) R&D Center, Beijing, China;Chinese Academy of Sciences, Beijing, China

  • Venue:
  • COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text. In this paper, we propose a method to perform domain adaptation for statistical machine translation, where in-domain bilingual corpora do not exist. This method first uses out-of-domain corpora to train a baseline system and then uses in-domain translation dictionaries and in-domain monolingual corpora to improve the in-domain performance. We propose an algorithm to combine these different resources in a unified framework. Experimental results indicate that our method achieves absolute improvements of 8.16 and 3.36 BLEU scores on Chinese to English translation and English to French translation respectively, as compared with the baselines using only out-of-domain corpora.