Temporal modulation normalization for robust speech feature extraction and recognition

Authors:
Xugang Lu;Shigeki Matsuda;Masashi Unoki;Satoshi Nakamura
Affiliations:
National Institute of Information and Communications Technology, Tokyo, Japan 184-8795;National Institute of Information and Communications Technology, Tokyo, Japan 184-8795;Japan Advanced Institute of Science and Technology, Ishikawa, Japan 923-1292;National Institute of Information and Communications Technology, Tokyo, Japan 184-8795
Venue:
Multimedia Tools and Applications
Year:
2011

Citing 8
Cited 0

Joint acoustic and modulation frequency

EURASIP Journal on Applied Signal Processing
Integration of audiovisual sensors and technologies in a smart room

Personal and Ubiquitous Computing
Temporal contrast normalization and edge-preserved smoothing on temporal modulation structure for robust speech recognition

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Recognition of speech in additive and convolutional noise based on RASTA spectral processing

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
MVA Processing of Speech Features

IEEE Transactions on Audio, Speech, and Language Processing
Optimization of temporal filters for constructing robust features in speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Normalization of the Speech Modulation Spectra for Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
On the origin of the bilateral filter and ways to improve it

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech signals are produced by the articulatory movements with a certain modulation structure constrained by the regular phonetic sequences. This modulation structure encodes most of the speech intelligibility information that can be used to discriminate the speech from noise. In this study, we proposed a noise reduction algorithm based on this speech modulation property. Two steps are involved in the proposed algorithm: one is the temporal modulation contrast normalization, another is the modulation events preserved smoothing. The purpose for these processing is to normalize the modulation contrast of the clean and noisy speech to be in the same level, and to smooth out the modulation artifacts caused by noise interferences. Since our proposed method can be used independently for noise reduction, it can be combined with the traditional noise reduction methods to further reduce the noise effect. We tested our proposed method as a front-end for robust speech recognition on the AURORA-2J data corpus. Two advanced noise reduction methods, ETSI advanced front-end (AFE) method, and particle filtering (PF) with minimum mean square error (MMSE) estimation method, are used for comparison and combinations. Experimental results showed that, as an independent front-end processor, our proposed method outperforms the advanced methods, and as combined front-ends, further improved the performance consistently than using each method independently.