Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition

Authors:
X. Lu;S. Matsuda;M. Unoki;S. Nakamura
Affiliations:
National Institute of Information and Communications Technology, Japan;National Institute of Information and Communications Technology, Japan;Japan Advanced Institute of Science and Technology, Japan;National Institute of Information and Communications Technology, Japan
Venue:
Speech Communication
Year:
2010

Citing 12
Cited 2

Fundamentals of speech recognition

Fundamentals of speech recognition
The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Bilateral Filtering for Gray and Color Images

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Advanced Digital Signal Processing and Noise Reduction

Advanced Digital Signal Processing and Noise Reduction
Joint acoustic and modulation frequency

EURASIP Journal on Applied Signal Processing
Integration of audiovisual sensors and technologies in a smart room

Personal and Ubiquitous Computing
Probabilistic amplitude demodulation

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
New insights into the noise reduction Wiener filter

IEEE Transactions on Audio, Speech, and Language Processing
MVA Processing of Speech Features

IEEE Transactions on Audio, Speech, and Language Processing
Optimization of temporal filters for constructing robust features in speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Normalization of the Speech Modulation Spectra for Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
On the origin of the bilateral filter and ways to improve it

IEEE Transactions on Image Processing

Single-channel speech enhancement using spectral subtraction in the short-time modulation domain

Speech Communication
Modulation domain blind speech separation in noisy environments

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, noise reduction methods for additive noise have been quite different from those for reverberation. In this study, we investigated the effect of additive noise and reverberation on speech on the basis of the concept of temporal modulation transfer. We first analyzed the noise effect on the temporal modulation of speech. Then on the basis of this analysis, we proposed a two-stage processing algorithm that adaptively normalizes the temporal modulation of speech to extract robust speech features for automatic speech recognition. In the first stage of the proposed algorithm, the temporal modulation contrast of the cepstral time series for both clean and noisy speech is normalized. In the second stage, the contrast normalized temporal modulation spectrum is smoothed in order to reduce the artifacts due to noise while preserving the information in the speech modulation events (edges). We tested our algorithm in speech recognition experiments for additive noise condition, reverberant condition, and noisy condition (both additive noise and reverberation) using the AURORA-2J data corpus. Our results showed that as part of a uniform processing framework, the algorithm helped achieve the following: (1) for the additive noise condition, a 55.85% relative word error reduction (RWER) rate when clean conditional training was performed, and a 41.64% RWER rate when multi-conditional training was performed, (2) for the reverberant condition, a 51.28% RWER rate, and (3) for the noisy condition (both additive noise and reverberation), a 95.03% RWER rate. In addition, we evaluated the performance of each stage of the proposed algorithm in AURORA-2J and AURORA4 experiments, and compared the performance of our algorithm with the performances of two similar processing algorithms in the second stage. The evaluation results further confirmed the effectiveness of our proposed algorithm.