Translation model based cross-lingual language model adaptation: from word models to phrase models

  • Authors:
  • Shixiang Lu;Wei Wei;Xiaoyin Fu;Bo Xu

  • Affiliations:
  • Chinese Academy of Sciences, Beijing, China;Chinese Academy of Sciences, Beijing, China;Chinese Academy of Sciences, Beijing, China;Chinese Academy of Sciences, Beijing, China

  • Venue:
  • EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a novel translation model (TM) based cross-lingual data selection model for language model (LM) adaptation in statistical machine translation (SMT), from word models to phrase models. Given a source sentence in the translation task, this model directly estimates the probability that a sentence in the target LM training corpus is similar. Compared with the traditional approaches which utilize the first pass translation hypotheses, cross-lingual data selection model avoids the problem of noisy proliferation. Furthermore, phrase TM based cross-lingual data selection model is more effective than the traditional approaches based on bag-of-words models and word-based TM, because it captures contextual information in modeling the selection of phrase as a whole. Experiments conducted on large-scale data sets demonstrate that our approach significantly outperforms the state-of-the-art approaches on both LM perplexity and SMT performance.