Recognition of speech in additive and convolutional noise based on RASTA spectral processing

Authors:
Hynek Hermansky;Nelson Morgan;Hans-Gunter Hirsch
Affiliations:
US WEST Advanced Technologies, Boulder, Colorado and International Computer Science Institute, Berkeley, California;International Computer Science Institute, Berkeley, California;University of Aachen, Germany and International Computer Science Institute, Berkeley, California
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Year:
1993

Citing 1
Cited 5

Continuous speech recognition using PLP analysis with multilayer perceptrons

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference

Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments

Computer Speech and Language
Temporal modulation normalization for robust speech feature extraction and recognition

Multimedia Tools and Applications
Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing

International Journal of Speech Technology
The use of wavelet entropy in conjuction with neural network for Arabic vowels recognition

WSEAS Transactions on Signal Processing
Arabic vowels recognition based on wavelet average framing linear prediction coding and neural network

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

RASTA speech processing was originally developed to reduce the sensitivity of recognizers to frequency characteristics of an operating environment (i.e., to convolutional noise). RASTA does this by band-pass filtering time trajectories of logarithmic parameters of speech (e.g., logarithmic spectral energies or cepstra). In our current paper we study RASTA processing in an alternative spectral domain which is linear-like for small spectral values and logarithmic-like for large spectral values. We show on experiments with a recognizer trained on the clean speech and test data degraded by both convolutional and additive noise that doing RASTA processing in the new domain yields results comparable to results obtained by training the recognizer on known noise.