Switching Linear Dynamical Systems for Noise Robust Speech Recognition

Authors:
B. . Mesot;D. . Barber
Affiliations:
IDIAP Res. Inst., Martigny;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 5

Switching Linear Dynamic Models for Noise Robust In-Car Speech Recognition

Proceedings of the 30th DAGM symposium on Pattern Recognition
Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement

EURASIP Journal on Audio, Speech, and Music Processing
Discrete denoising with shifts

IEEE Transactions on Information Theory
Hybrid HMM/BLSTM-RNN for robust speech recognition

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Survey on speech emotion recognition: Features, classification schemes, and databases

Pattern Recognition

Quantified Score

Hi-index	0.06

Visualization

Abstract

Real world applications such as hands-free dialling in cars may have to deal with potentially very noisy environments. Existing state-of-the-art solutions to this problem use feature-based HMMs, with a preprocessing stage to clean the noisy signal. However, the effect that raw signal noise has on the induced HMM features is poorly understood, and limits the performance of the HMM system. An alternative to feature-based HMMs is to model the raw signal, which has the potential advantage that including an explicit noise model is straightforward. Here we jointly model the dynamics of both the raw speech signal and the noise, using a switching linear dynamical system (SLDS). The new model was tested on isolated digit utterances corrupted by Gaussian noise. Contrary to the autoregressive HMM and its derivatives, which provides a model of uncorrupted raw speech, the SLDS is comparatively noise robust and also significantly outperforms a state-of-the-art feature-based HMM. The computational complexity of the SLDS scales exponentially with the length of the time series. To counter this we use expectation correction which provides a stable and accurate linear-time approximation for this important class of models, aiding their further application in acoustic modeling.