A novel framework and training algorithm for variable-parameter hidden Markov models
IEEE Transactions on Audio, Speech, and Language Processing
MMSE estimation of log-filterbank energies for robust speech recognition
Speech Communication
Multichannel Cepstral Domain Feature Warping for Robust Speech Recognition
Proceedings of the 2011 conference on Neural Nets WIRN10: Proceedings of the 20th Italian Workshop on Neural Nets
An evaluation study on speech feature densities for Bayesian estimation in robust ASR
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Comparative evaluation of single-channel MMSE-Based noise reduction schemes for speech recognition
Journal of Electrical and Computer Engineering
Efficient SNR driven SPLICE implementation for robust speech recognition
COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Importance sampling to compute likelihoods of noise-corrupted speech
Computer Speech and Language
Speech enhancement using hidden Markov models in Mel-frequency domain
Speech Communication
Computer Speech and Language
Hi-index | 0.00 |
We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.