Estimation of the signal-to-noise ratio with amplitude modulation spectrograms

Authors:
Jürgen Tchorz;Birger Kollmeier
Affiliations:
Phonak Hearing Systems, Laubisrütistr. 28, 8712 Stäfa, Switzerland;AG Medizinische Physik, Universität Oldenburg, 26111 Oldenburg, Germany
Venue:
Speech Communication
Year:
2002

Citing 2
Cited 3

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
A method of signal extraction from noisy signal based on auditory scene analysis

Speech Communication

Sub-band SNR estimation using auditory feature processing

Speech Communication - Special issue on speech processing for hearing aids
On noise masking for automatic missing data speech recognition: A survey and discussion

Computer Speech and Language
Missing data mask estimation with frequency and temporal dependencies

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

An algorithm is proposed which automatically estimates the local signal-to-noise ratio (SNR) between speech and noise. The feature extraction stage of the algorithm is motivated by neurophysiological findings on amplitude modulation processing in higher stages of the auditory system in mammals. It analyzes information on both center frequencies and amplitude modulations of the input signal. This information is represented in two-dimensional, so-called amplitude modulation spectrograms (AMS). A neural network is trained on a large number of AMS patterns generated from mixtures of speech and noise. After training, the network supplies estimates of the local SNR when AMS patterns from "unknown" sound sources are presented. Classification experiments show a relatively accurate estimation of the present SNR in independent 32 ms analysis frames. Harmonicity appears to be the most important cue for analysis frames to be classified as "speech-like", but the spectro-temporal representation of sound in AMS patterns also allows for a reliable discrimination between unvoiced speech and noise.