The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Cross-lingual lexical triggers in statistical language modeling
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Towards language independent acoustic modeling
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
POS tagging of dialectal Arabic: a minimally supervised approach
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Multilingual topic models for unaligned text
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Cross-lingual latent topic extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Mining monolingual and bilingual corpora
Intelligent Data Analysis
Hi-index | 0.00 |
In-domain texts for estimating statistical language models are not easily found for most languages of the world. We present two techniques to take advantage of in-domain text resources in other languages. First, we extend the notion of lexical triggers, which have been used monolingually for language model adaptation, to the cross-lingual problem, permitting the construction of sharper language models for a target-language document by drawing statistics from related documents in a resource-rich language. Next, we show that cross-lingual latent semantic analysis is similarly capable of extracting useful statistics for language modeling. Neither technique requires explicit translation capabilities between the two languages! We demonstrate significant reductions in both perplexity and word error rate on a Mandarin speech recognition task by using these techniques.