Recognition of isolated words based on psychoacoustics and neurobiology
Speech Communication - Neurospeech
Robust speech recognition using the modulation spectrogram
Speech Communication - Special issue on robust speech recognition
Combining speech enhancement and auditory feature extraction for robust speech recognition
Speech Communication - Special issue on noise robust ASR
Assessing local noise level estimation methods: application to noise robust ASR
Speech Communication - Special issue on noise robust ASR
Estimation of the signal-to-noise ratio with amplitude modulation spectrograms
Speech Communication
Subband-Based Speech Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
Hi-index | 0.00 |
In this paper a new approach is presented for estimating the long-term speech-to-noise ratio (SNR) in individual frequency bands that is based on methods known from automatic speech recognition (ASR). It uses a model of auditory perception as front end, physiologically and psychoacoustically motivated sigma-pi cells as secondary features, and a linear or non-linear neural network as classifier. A non-linear neural network back end is capable of estimating the SNR in time segments of 1 s with a root-mean-square error of 5.68 dB on unknown test material. This performance is obtained on a large set of natural types of noise, containing instationary signals and alarm sounds. However, the SNR estimation works best for more stationary types of noise. The individual components of the estimation algorithms are examined with respect to their importance for the estimation accuracy. The algorithm presented in this paper yields similar or better results with comparable computational effort relative to other methods known from the literature for short-term SNR estimation. The new approach is purely based on slow spectro-temporal modulations and is therefore a valuable contribution to both, digital hearing-aids and ASR systems.