Statistical machine translation and automatic speech recognition under uncertainty

  • Authors:
  • William J. Byrne;Lambert Mathias

  • Affiliations:
  • The Johns Hopkins University;The Johns Hopkins University

  • Venue:
  • Statistical machine translation and automatic speech recognition under uncertainty
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical modeling techniques have been applied successfully to natural language processing tasks such as automatic speech recognition (ASR) and statistical machine translation (SMT). Since most statistical approaches rely heavily on availability of data and the underlying model assumptions, reduction in uncertainty is critical to their optimal performance. In speech translation, the uncertainty is due to the speech input to the SMT system whose elements are represented as distributions over sequences. A novel approach to statistical phrase-based speech translation is proposed. This approach is based on a generative, source-channel model of translation, similar in spirit to the modeling approaches that underly hidden Markov model(HMM)-based ASR systems: in fact, our model of speech-to-text translation contains the acoustic models of a large vocabulary ASR system as one of its components. This model of speech-to-text translation is developed as a direct extension of the phrase-based models used in text translation systems. Speech is translated by mapping ASR word lattices to lattices of phrase sequences which are then translated using operations developed for text translation. Efficient phrase extraction from ASR lattices and word and phrase level pruning strategies for speech translation are investigated to reduce uncertainty in translation of speech.In order to achieve good translation performance it is necessary to find optimal parameters under a particular training objective. Two different discriminative training objective functions are investigated: Maximum Mutual Information (MMI) and Expected BLEU. A novel iterative optimization procedure, using growth transformations is proposed as a parameter update procedure for the training criteria. The translation performance using growth transformation based updates is investigated in detail.Training a highly accurate ASR systems requires availability of speech corpora with reliable verbatim transcripts. However, accurately transcribed training data are not always available and manually generating them is not always a feasible option. A novel lightly supervised approach to training acoustic models is presented that leverages information from non-literal transcripts. In particular, a method for discriminatively training acoustic models using non-literal transcripts is presented. Reliable segments in the acoustic frame are automatically identified and the unreliable frames are filtered during model parameter estimation.