Fundamentals of statistical signal processing: estimation theory
Fundamentals of statistical signal processing: estimation theory
Speech recognition in noisy environments: a survey
Speech Communication
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Acoustical and Environmental Robustness in Automatic Speech Recognition
Acoustical and Environmental Robustness in Automatic Speech Recognition
Transonics: a practical speech-to-speech translator for English-Farsi medical dialogues
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
A vector Taylor series approach for environment-independent speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Stereo-based stochastic mapping with discriminative training for noise robust speech recognition
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Incremental adaptation of speech-to-speech translation
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
A Study of Variable-Parameter Gaussian Mixture Hidden Markov Modeling for Noisy Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
Computer Speech and Language
Hi-index | 0.00 |
This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions.