Hybrid HMM/BLSTM-RNN for robust speech recognition

Authors:
Yang Sun;Louis Ten Bosch;Lou Boves
Affiliations:
Department of Linguistics, Radboud University, Nijmegen, The Netherlands;Department of Linguistics, Radboud University, Nijmegen, The Netherlands;Department of Linguistics, Radboud University, Nijmegen, The Netherlands
Venue:
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Year:
2010

Citing 8
Cited 1

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
Long Short-Term Memory

Neural Computation
Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An application of recurrent neural networks to discriminative keyword spotting

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Bidirectional LSTM networks for improved phoneme classification and recognition

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Bidirectional recurrent neural networks

IEEE Transactions on Signal Processing
Switching Linear Dynamical Systems for Noise Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Artificial neural networks as multi-networks automated test oracle

Automated Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The question how to integrate information from different sources in speech decoding is still only partially solved (layered architecture versus integrated search). We investigate the optimal integration of information from Artificial Neural Nets in a speech decoding scheme based on a Dynamic Bayesian Network for noise robust ASR. A HMM implemented by the DBN cooperates with a novel Recurrent Neural Network (BLSTM-RNN), which exploits long-range context information to predict a phoneme for each MFCC frame. When using the identity of the most likely phoneme as a direct observation, such a hybrid system has proved to improve noise robustness. In this paper, we use the complete BLSTM-RNN output which is presented to the DBN as Virtual Evidence. This allows the hybrid system to use information about all phoneme candidates, which was not possible in previous experiments. Our approach improved word accuracy on the Aurora 2 Corpus by 8%.