Lexical triggers and latent semantic analysis for cross-lingual language model adaptation

  • Authors:
  • Woosung Kim;Sanjeev Khudanpur

  • Affiliations:
  • The Johns Hopkins University, Baltimore, MD;The Johns Hopkins University, Baltimore, MD

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In-domain texts for estimating statistical language models are not easily found for most languages of the world. We present two techniques to take advantage of in-domain text resources in other languages. First, we extend the notion of lexical triggers, which have been used monolingually for language model adaptation, to the cross-lingual problem, permitting the construction of sharper language models for a target-language document by drawing statistics from related documents in a resource-rich language. Next, we show that cross-lingual latent semantic analysis is similarly capable of extracting useful statistics for language modeling. Neither technique requires explicit translation capabilities between the two languages! We demonstrate significant reductions in both perplexity and word error rate on a Mandarin speech recognition task by using these techniques.