Language model adaptation using machine-translated text for resource-deficient languages

  • Authors:
  • Arnar Thor Jensson;Koji Iwano;Sadaoki Furui

  • Affiliations:
  • Department of Computer Science, Tokyo Institute of Technology, Ookayama, Tokyo, Japan;Department of Computer Science, Tokyo Institute of Technology, Ookayama, Tokyo, Japan;Department of Computer Science, Tokyo Institute of Technology, Ookayama, Tokyo, Japan

  • Venue:
  • EURASIP Journal on Audio, Speech, and Music Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT) from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.