An improved model of masking effects for robust speech recognition system

Authors:
Peng Dai;Ing Yann Soon
Affiliations:
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore
Venue:
Speech Communication
Year:
2013

Citing 9
Cited 0

Should recognizers have ears?

Speech Communication - Special issue on robust speech recognition
Speech and Audio Signal Processing: Processing and Perception of Speech and Music

Speech and Audio Signal Processing: Processing and Perception of Speech and Music
On the optimal linear filtering techniques for noise reduction

Speech Communication
2D psychoacoustic filtering for robust speech recognition

ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
A temporal warped 2D psychoacoustic modeling for robust speech recognition system

Speech Communication
A computational model of binaural speech recognition: Role of across-frequency vs. within-frequency processing and internal noise

Speech Communication
A temporal frequency warped (TFW) 2D psychoacoustic filter for robust speech recognition system

Speech Communication
MVA Processing of Speech Features

IEEE Transactions on Audio, Speech, and Language Processing
Automatic speech recognition with an adaptation model motivated by auditory processing

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance of an automatic speech recognition system drops dramatically in the presence of background noise unlike the human auditory system which is more adept at noisy speech recognition. This paper proposes a novel auditory modeling algorithm which is integrated into the feature extraction front-end for Hidden Markov Model (HMM). The proposed algorithm is named LTFC which simulates properties of the human auditory system and applies it to the speech recognition system to enhance its robustness. It integrates simultaneous masking, temporal masking and cepstral mean and variance normalization into ordinary mel-frequency cepstral coefficients (MFCC) feature extraction algorithm for robust speech recognition. The proposed method sharpens the power spectrum of the signal in both the frequency domain and the time domain. Evaluation tests are carried out on the AURORA2 database. Experimental results show that the word recognition rate using our proposed feature extraction method has been effectively increased.