Topic adaptation for lecture translation through bilingual latent semantic models

Authors:
Nick Ruiz;Marcello Federico
Affiliations:
Free University of Bozen-Bolzano, Bolzano, Italy;FBK-irst, Fondazione Bruno Kessler, Trento, Italy
Venue:
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Year:
2011

Citing 12
Cited 4

Latent dirichlet allocation

The Journal of Machine Learning Research
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Language model adaptation for statistical machine translation with structured query models

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Bilingual LSA-based adaptation for statistical machine translation

Machine Translation
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Incorporating monolingual corpora into bilingual latent semantic analysis for crosslingual LM adaptation

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Style & topic language model adaptation using HMM-LDA

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Selecting relevant text subsets from web-data for building topic specific language models

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Mixture-model adaptation for SMT

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in domain adaptation for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Cutting the long tail: hybrid language models for translation style adaptation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Translation model adaptation for statistical machine translation with monolingual topic information

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A topic similarity model for hierarchical phrase-based translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Joint and coupled bilingual topic model based sentence representations for language model adaptation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents a simplified approach to bilingual topic modeling for language model adaptation by combining text in the source and target language into very short documents and performing Probabilistic Latent Semantic Analysis (PLSA) during model training. During inference, documents containing only the source language can be used to infer a full topic-word distribution on all words in the target language's vocabulary, from which we perform Minimum Discrimination Information (MDI) adaptation on a background language model (LM). We apply our approach on the English-French IWSLT 2010 TED Talk exercise, and report a 15% reduction in perplexity and relative BLEU and NIST improvements of 3% and 2.4%, respectively over a baseline only using a 5-gram background LM over the entire translation task. Our topic modeling approach is simpler to construct than its counterparts.