Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

Authors:
Dong Yu;Li Deng;J. Droppo;Jian Wu;Yifan Gong;A. Acero
Affiliations:
Microsoft Corp., Redmond, WA;-;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 10

A novel framework and training algorithm for variable-parameter hidden Markov models

IEEE Transactions on Audio, Speech, and Language Processing
MMSE estimation of log-filterbank energies for robust speech recognition

Speech Communication
Multichannel Cepstral Domain Feature Warping for Robust Speech Recognition

Proceedings of the 2011 conference on Neural Nets WIRN10: Proceedings of the 20th Italian Workshop on Neural Nets
An evaluation study on speech feature densities for Bayesian estimation in robust ASR

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Comparative evaluation of single-channel MMSE-Based noise reduction schemes for speech recognition

Journal of Electrical and Computer Engineering
Environmental robust speech and speaker recognition through multi-channel histogram equalization

Neurocomputing
Efficient SNR driven SPLICE implementation for robust speech recognition

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Importance sampling to compute likelihoods of noise-corrupted speech

Computer Speech and Language
Speech enhancement using hidden Markov models in Mel-frequency domain

Speech Communication
Integration of beamforming and uncertainty-of-observation techniques for robust ASR in multi-source environments

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.