Stereo hidden Markov modeling for noise robust speech recognition

Authors:
Xiaodong Cui;Mohamed Afify;Yuqing Gao;Bowen Zhou
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;Orange Lab, Smart Village, Cairo, Egypt;IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA
Venue:
Computer Speech and Language
Year:
2013

Citing 10
Cited 0

Fundamentals of statistical signal processing: estimation theory

Fundamentals of statistical signal processing: estimation theory
Speech recognition in noisy environments: a survey

Speech Communication
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Acoustical and Environmental Robustness in Automatic Speech Recognition

Acoustical and Environmental Robustness in Automatic Speech Recognition
Transonics: a practical speech-to-speech translator for English-Farsi medical dialogues

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Stereo-based stochastic mapping with discriminative training for noise robust speech recognition

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Incremental adaptation of speech-to-speech translation

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
A Study of Variable-Parameter Gaussian Mixture Hidden Markov Modeling for Noisy Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Evaluation methodology and metrics employed to assess the TRANSTAC two-way, speech-to-speech translation systems

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions.