Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition

Authors:
Anthony Stark;Kuldip Paliwal
Affiliations:
Signal Processing Laboratory, Griffith University, Nathan, QLD 4111, Australia;Signal Processing Laboratory, Griffith University, Nathan, QLD 4111, Australia
Venue:
Speech Communication
Year:
2011

Citing 3
Cited 1

Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
A review of signal subspace speech enhancement and its application to noise robust speech recognition

EURASIP Journal on Applied Signal Processing
Constrained iterative speech enhancement with application to speechrecognition

IEEE Transactions on Signal Processing

An educational platform to demonstrate speech processing techniques on Android based smart phones and tablets

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate the use of the minimum mean square error (MMSE) spectral energy estimator for use in environment-robust automatic speech recognition (ASR). In the past, it has been common to use the MMSE log-spectral amplitude estimator for this task. However, this estimator was originally derived under subjective human listening criteria. Therefore its complex suppression rule may not be optimal for use in ASR. On the other hand, it can be shown that the MMSE spectral energy estimator is closely related to the MMSE Mel-frequency cepstral coefficient (MFCC) estimator. Despite this, the spectral energy estimator has tended to suffer from the problem of excessive residual noise. We examine the cause of this residual noise and show that the introduction of a heuristic based speech presence uncertainty (SPU) can significantly improve its performance as a front-end ASR enhancement regime. The proposed spectral energy SPU estimator is evaluated on the Aurora2, RM and OLLO2 speech recognition tasks and can be shown to significantly improve additive noise robustness over the more common spectral amplitude and log-spectral amplitude estimators.