Joint and coupled bilingual topic model based sentence representations for language model adaptation

Authors:
Shixiang Lu;Xiaoyin Fu;Wei Wei;Xingyuan Peng;Bo Xu
Affiliations:
Interactive Digital Media Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China;Interactive Digital Media Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China;Interactive Digital Media Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China;Interactive Digital Media Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China;Interactive Digital Media Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 16
Cited 0

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Language model adaptation for statistical machine translation with structured query models

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Resampling auxiliary data for language model adaptation in machine translation for speech

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Language and translation model adaptation using comparable corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mixture-model adaptation for SMT

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
On smoothing and inference for topic models

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Posterior Regularization for Structured Latent Variable Models

The Journal of Machine Learning Research
Translingual document representations from discriminative projections

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
On-line language model biasing for statistical machine translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Clickthrough-based latent semantic models for web search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Topic adaptation for lecture translation through bilingual latent semantic models

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Domain adaptation via pseudo in-domain data selection

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Translation model based cross-lingual language model adaptation: from word models to phrase models

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is concerned with data selection for adapting language model (LM) in statistical machine translation (SMT), and aims to find the LM training sentences that are topic similar to the translation task. Although the traditional approaches have gained significant performance, they ignore the topic information and the distribution information of words when selecting similar training sentences. In this paper, we present two bilingual topic model (BLTM) (joint and coupled BLTM) based sentence representations for cross-lingual data selection. We map the data selection task into cross-lingual semantic representations that are language independent, then rank and select sentences in the target language LM training corpus for a sentence in the translation task by the semanticsbased likelihood. The semantic representations are learned from the parallel corpus, with the assumption that the bilingual pair shares the same or similar distribution over semantic topics. Large-scale experimental results demonstrate that our approaches significantly outperform the state-of-the-art approaches on both LM perplexity and translation performance, respectively.