Perceptual features for automatic speech recognition in noisy environments

Authors:
Serajul Haque;Roberto Togneri;Anthony Zaknich
Affiliations:
School of Electrical, Electronic and Computer Engineering, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia;School of Electrical, Electronic and Computer Engineering, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia;School of Electrical, Electronic and Computer Engineering, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
Venue:
Speech Communication
Year:
2009

Citing 2
Cited 4

Time-Frequency Analysis of Acoustic Transients

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Automatic speech recognition with an adaptation model motivated by auditory processing

IEEE Transactions on Audio, Speech, and Language Processing

Isolate Speech Recognition Based on Time-Frequency Analysis Methods

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency

Speech Communication
A hierarchical framework for spectro-temporal feature extraction

Speech Communication
2013 Special Issue: Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performances of two perceptual properties of the peripheral auditory system, synaptic adaptation and two-tone suppression, are compared for automatic speech recognition (ASR) in an additive noise environment. A simple method of synaptic adaptation as determined by psychoacoustic observations was implemented with temporal processing of speech utilizing a zero-crossing auditory model as a pre-processing front end. The concept is similar to RASTA processing, but instead of bandpass filters, a high-pass infinite impulse response (IIR) filter is used. It is shown that rapid synaptic adaptation may be implemented by temporal processing using the zero-crossing algorithm, not otherwise implementable in the spectral domain implementation. The two-tone suppression was implemented in the zero-crossing auditory model using a companding strategy. Recognition performances with the two perceptual features were evaluated on isolated digits (TIDIGITS) corpus using continuous density HMM recognizer in white, factory, babble and Volvo noise. It is observed that synaptic adaptation performs better in stationary white Gaussian noise. In presence of non-stationary non-Gaussian noise, however, no improvements or a degradation is observed. Moreover, a reciprocal effect is observed with two-tone suppression, with better performance in non-Gaussian real-world noise and degradation in stationary white Gaussian noise.