Neural Networks - 2005 Special issue: IJCNN 2005
Learning to Forget: Continual Prediction with LSTM
Neural Computation
ICML '06 Proceedings of the 23rd international conference on Machine learning
Neural Computation
EURASIP Journal on Audio, Speech, and Music Processing
Monaural speech separation and recognition challenge
Computer Speech and Language
IEEE Transactions on Signal Processing
An application of recurrent neural networks to discriminative keyword spotting
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Enhancing spontaneous speech recognition with BLSTM features
NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Bidirectional recurrent neural networks
IEEE Transactions on Signal Processing
Convolutive Speech Bases and Their Application to Supervised Speech Separation
IEEE Transactions on Audio, Speech, and Language Processing
Learning long-term dependencies in NARX recurrent neural networks
IEEE Transactions on Neural Networks
Learning long-term dependencies with gradient descent is difficult
IEEE Transactions on Neural Networks
Online Driver Distraction Detection Using Long Short-Term Memory
IEEE Transactions on Intelligent Transportation Systems
Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
The PASCAL CHiME speech separation and recognition challenge
Computer Speech and Language
Fusion of parametric and non-parametric approaches to noise-robust ASR
Speech Communication
Hi-index | 0.00 |
This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for context-sensitive Tandem feature extraction and show how the Connectionist Temporal Classification approach can be used as a BLSTM-based back-end, alternatively to Hidden Markov Models (HMM). We combine context-sensitive BLSTM-based feature generation and speech decoding techniques with source separation by convolutive non-negative matrix factorization. Applying our speaker adapted multi-stream HMM framework that processes MFCC features from NMF-enhanced speech as well as word predictions obtained via BLSTM networks and non-negative sparse classification (NSC), we obtain an average accuracy of 91.86% on the PASCAL CHiME Challenge task at signal-to-noise ratios ranging from -6 to 9dB. To our knowledge, this is the best result ever reported for the CHiME Challenge task.