Cross-lingual lexical triggers in statistical language modeling

Authors:
Woosung Kim;Sanjeev Khudanpur
Affiliations:
The Johns Hopkins University, Baltimore, MD;The Johns Hopkins University, Baltimore, MD
Venue:
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Year:
2003

Citing 4
Cited 5

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Towards language independent acoustic modeling

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02

Lexical triggers and latent semantic analysis for cross-lingual language model adaptation

ACM Transactions on Asian Language Information Processing (TALIP)
Triplet lexicon models for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Language and translation model adaptation using comparable corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A comparison of segmentation methods and extended lexicon models for Arabic statistical machine translation

Machine Translation
Cross-lingual language modeling with syntactic reordering for low-resource speech recognition

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose new methods to take advantage of text in resource-rich languages to sharpen statistical language models in resource-deficient languages. We achieve this through an extension of the method of lexical triggers to the cross-language problem, and by developing a likelihood-based adaptation scheme for combining a trigger model with an N-gram model. We describe the application of such language models for automatic speech recognition. By exploiting a side-corpus of contemporaneous English news articles for adapting a static Chinese language model to transcribe Mandarin news stories, we demonstrate significant reductions in both perplexity and recognition errors. We also compare our cross-lingual adaptation scheme to monolingual language model adaptation, and to an alternate method for exploiting cross-lingual cues, via cross-lingual information retrieval and machine translation, proposed elsewhere.