Normalization of the Speech Modulation Spectra for Robust Speech Recognition

Authors:
Xiong Xiao;Eng Siong Chng;Haizhou Li
Affiliations:
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 11

Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition

Speech Communication
Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments

Proceedings of the 3rd International Universal Communication Symposium
A study on the generalization capability of acoustic models for robust speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Missing-feature reconstruction by leveraging temporal spectral correlation for robust speech recognition in background noise conditions

IEEE Transactions on Audio, Speech, and Language Processing
Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments

Computer Speech and Language
Temporal modulation normalization for robust speech feature extraction and recognition

Multimedia Tools and Applications
Compensating the speech features via discrete cosine transform for robust speech recognition

ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Probabilistic modulation spectrum factorization for robust speech recognition

ROCLING '11 ROCLING 2011 Poster Papers
Fast communication: Improved modulation spectrum enhancement methods for robust speech recognition

Signal Processing
An optimized reconfigurable power spectral density converter for real-time shrew DDoS attacks detection

Computers and Electrical Engineering
A multi-modal gesture recognition system using audio, video, and skeletal joint data

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study a novel technique that normalizes the modulation spectra of speech signals for robust speech recognition. The modulation spectra of a speech signal are the power spectral density (PSD) functions of the feature trajectories generated from the signal, hence they describe the temporal structure of the features. The modulation spectra are distorted when the speech signal is corrupted by noise. We propose the temporal structure normalization (TSN) filter to reduce the noise effects by normalizing the modulation spectra to reference spectra. The TSN filter is different from other feature normalization methods such as the histogram equalization (HEQ) that only normalize the probability distributions of the speech features. Our previous work showed promising results of TSN on a small vocabulary Aurora-2 task. In this paper, we conduct an inquiry into the theoretical and practical issues of the TSN filter that includes the following. 1) We investigate the effects of noises on the speech modulation spectra and show the general characteristics of noisy speech modulation spectra. The observations help to further explain and justify the TSN filter. 2) We evaluate the TSN filter on the Aurora-4 task and demonstrate its effectiveness for a large vocabulary task. 3) We propose a segment-based implementation of the TSN filter that reduces the processing delay significantly without affecting the performance. Overall, the TSN filter produces significant improvements over the baseline systems, and delivers competitive results when compared to other state-of-the-art temporal filters.