Sub-band SNR estimation using auditory feature processing

Authors:
Michael Kleinschmidt;Volker Hohmann
Affiliations:
Medizinische Physik, Universität Oldenburg, 26111 Oldenburg, Germany;Medizinische Physik, Universität Oldenburg, 26111 Oldenburg, Germany
Venue:
Speech Communication - Special issue on speech processing for hearing aids
Year:
2003

Citing 6
Cited 2

Recognition of isolated words based on psychoacoustics and neurobiology

Speech Communication - Neurospeech
Robust speech recognition using the modulation spectrogram

Speech Communication - Special issue on robust speech recognition
Combining speech enhancement and auditory feature extraction for robust speech recognition

Speech Communication - Special issue on noise robust ASR
Assessing local noise level estimation methods: application to noise robust ASR

Speech Communication - Special issue on noise robust ASR
Estimation of the signal-to-noise ratio with amplitude modulation spectrograms

Speech Communication
Subband-Based Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2

Time-frequency feature extraction from spectrograms and wavelet packets with application to automatic stress and emotion classification in speech

ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper a new approach is presented for estimating the long-term speech-to-noise ratio (SNR) in individual frequency bands that is based on methods known from automatic speech recognition (ASR). It uses a model of auditory perception as front end, physiologically and psychoacoustically motivated sigma-pi cells as secondary features, and a linear or non-linear neural network as classifier. A non-linear neural network back end is capable of estimating the SNR in time segments of 1 s with a root-mean-square error of 5.68 dB on unknown test material. This performance is obtained on a large set of natural types of noise, containing instationary signals and alarm sounds. However, the SNR estimation works best for more stationary types of noise. The individual components of the estimation algorithms are examined with respect to their importance for the estimation accuracy. The algorithm presented in this paper yields similar or better results with comparable computational effort relative to other methods known from the literature for short-term SNR estimation. The new approach is purely based on slow spectro-temporal modulations and is therefore a valuable contribution to both, digital hearing-aids and ASR systems.