Stereo hidden Markov modeling for noise robust speech recognition

  • Authors:
  • Xiaodong Cui;Mohamed Afify;Yuqing Gao;Bowen Zhou

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;Orange Lab, Smart Village, Cairo, Egypt;IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions.