MMSE estimation of log-filterbank energies for robust speech recognition

Authors:
Anthony Stark;Kuldip Paliwal
Affiliations:
Signal Processing Laboratory, Griffith University, Nathan Campus, Brisbane QLD 4111, Australia;Signal Processing Laboratory, Griffith University, Nathan Campus, Brisbane QLD 4111, Australia
Venue:
Speech Communication
Year:
2011

Citing 10
Cited 0

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Readings in speech recognition
Computation of the gamma, digamma, and trigamma functions

SIAM Journal on Numerical Analysis
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Speech recognition in noisy environments

Speech recognition in noisy environments
A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
A review of signal subspace speech enhancement and its application to noise robust speech recognition

EURASIP Journal on Applied Signal Processing
Constrained iterative speech enhancement with application to speechrecognition

IEEE Transactions on Signal Processing
Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

IEEE Transactions on Audio, Speech, and Language Processing
Minimum Mean-Squared Error Estimation of Mel-Frequency Cepstral Coefficients Using a Novel Distortion Model

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we derive a minimum mean square error log-filterbank energy estimator for environment-robust automatic speech recognition. While several such estimators exist within the literature, most involve trade-offs between simplifications of the log-filterbank noise distortion model and analytical tractability. To avoid this limitation, we extend a well known spectral domain noise distortion model for use in the log-filterbank energy domain. To do this, several mathematical transformations are developed to transform spectral domain models into filterbank and log-filterbank energy models. As a result, a new estimator is developed that allows for robust estimation of both log-filterbank energies and subsequent Mel-frequency cepstral coefficients. The proposed estimator is evaluated over the Aurora2, and RM speech recognition tasks, with results showing a significant reduction in word recognition error over both baseline results and several competing estimators.