Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
A method of signal extraction from noisy signal based on auditory scene analysis
Speech Communication
Sub-band SNR estimation using auditory feature processing
Speech Communication - Special issue on speech processing for hearing aids
On noise masking for automatic missing data speech recognition: A survey and discussion
Computer Speech and Language
Missing data mask estimation with frequency and temporal dependencies
Computer Speech and Language
Hi-index | 0.00 |
An algorithm is proposed which automatically estimates the local signal-to-noise ratio (SNR) between speech and noise. The feature extraction stage of the algorithm is motivated by neurophysiological findings on amplitude modulation processing in higher stages of the auditory system in mammals. It analyzes information on both center frequencies and amplitude modulations of the input signal. This information is represented in two-dimensional, so-called amplitude modulation spectrograms (AMS). A neural network is trained on a large number of AMS patterns generated from mixtures of speech and noise. After training, the network supplies estimates of the local SNR when AMS patterns from "unknown" sound sources are presented. Classification experiments show a relatively accurate estimation of the present SNR in independent 32 ms analysis frames. Harmonicity appears to be the most important cue for analysis frames to be classified as "speech-like", but the spectro-temporal representation of sound in AMS patterns also allows for a reliable discrimination between unvoiced speech and noise.