Estimation of the signal-to-noise ratio with amplitude modulation spectrograms

  • Authors:
  • Jürgen Tchorz;Birger Kollmeier

  • Affiliations:
  • Phonak Hearing Systems, Laubisrütistr. 28, 8712 Stäfa, Switzerland;AG Medizinische Physik, Universität Oldenburg, 26111 Oldenburg, Germany

  • Venue:
  • Speech Communication
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

An algorithm is proposed which automatically estimates the local signal-to-noise ratio (SNR) between speech and noise. The feature extraction stage of the algorithm is motivated by neurophysiological findings on amplitude modulation processing in higher stages of the auditory system in mammals. It analyzes information on both center frequencies and amplitude modulations of the input signal. This information is represented in two-dimensional, so-called amplitude modulation spectrograms (AMS). A neural network is trained on a large number of AMS patterns generated from mixtures of speech and noise. After training, the network supplies estimates of the local SNR when AMS patterns from "unknown" sound sources are presented. Classification experiments show a relatively accurate estimation of the present SNR in independent 32 ms analysis frames. Harmonicity appears to be the most important cue for analysis frames to be classified as "speech-like", but the spectro-temporal representation of sound in AMS patterns also allows for a reliable discrimination between unvoiced speech and noise.