Bilingual LSA-based adaptation for statistical machine translation

Authors:
Yik-Cheung Tam;Ian Lane;Tanja Schultz
Affiliations:
Carnegie Mellon University, Pittsburgh, USA 15213;Carnegie Mellon University, Pittsburgh, USA 15213;Carnegie Mellon University, Pittsburgh, USA 15213
Venue:
Machine Translation
Year:
2007

Citing 8
Cited 9

Latent dirichlet allocation

The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
BiTAM: bilingual topic AdMixture models for word alignment

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Style & topic language model adaptation using HMM-LDA

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Topic adaptation for lecture translation through bilingual latent semantic models

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Cache-based document-level statistical machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Translation model adaptation for statistical machine translation with monolingual topic information

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A topic similarity model for hierarchical phrase-based translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Topic models for dynamic translation model adaptation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Translation model based cross-lingual language model adaptation: from word models to phrase models

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Document-wide decoding for phrase-based statistical machine translation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
DCU-symantec submission for the WMT 2012 quality estimation task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation (SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated on the Chinese---English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores. Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously, the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE SMT system.