A temporal warped 2D psychoacoustic modeling for robust speech recognition system

Authors:
Peng Dai;Ing Yann Soon
Affiliations:
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore
Venue:
Speech Communication
Year:
2011

Citing 3
Cited 2

Should recognizers have ears?

Speech Communication - Special issue on robust speech recognition
Speech and Audio Signal Processing: Processing and Perception of Speech and Music

Speech and Audio Signal Processing: Processing and Perception of Speech and Music
2D psychoacoustic filtering for robust speech recognition

ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing

A temporal frequency warped (TFW) 2D psychoacoustic filter for robust speech recognition system

Speech Communication
An improved model of masking effects for robust speech recognition system

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Human auditory system performs better than speech recognition system under noisy condition, which leads us to the idea of incorporating the human auditory system into automatic speech recognition engines. In this paper, a hybrid feature extraction method, which utilizes forward masking, backward masking, and lateral inhibition, is incorporated into mel-frequency cepstral coefficients (MFCC). The integration is implemented using a warped 2D psychoacoustic filter. The AURORA2 database is utilized for testing, and the Hidden Markov Model (HMM) is used for recognition. Comparison is made against lateral inhibition (LI), forward masking (FM), cepstral mean and variance normalization (CMVN), the original 2D psychoacoustic filter and the RASTA filter. Experimental results show that the word recognition rate is significantly improved, especially under noisy conditions.