Enhancing spontaneous speech recognition with BLSTM features

Authors:
Martin Wöllmer;Björn Schuller
Affiliations:
Institute for Human-Machine Communication, Technische Universität München, München, Germany;Institute for Human-Machine Communication, Technische Universität München, München, Germany
Venue:
NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Year:
2011

Citing 8
Cited 1

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Neural Networks - 2005 Special issue: IJCNN 2005
Learning to Forget: Continual Prediction with LSTM

Neural Computation
Long Short-Term Memory

Neural Computation
An application of recurrent neural networks to discriminative keyword spotting

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments

Computer Speech and Language
Tandem connectionist feature extraction for conversational speech recognition

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Bidirectional recurrent neural networks

IEEE Transactions on Signal Processing
Online Driver Distraction Detection Using Long Short-Term Memory

IEEE Transactions on Intelligent Transportation Systems

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a novel context-sensitive feature extraction approach for spontaneous speech recognition. As bidirectional Long Short-Term Memory (BLSTM) networks are known to enable improved phoneme recognition accuracies by incorporating long-range contextual information into speech decoding, we integrate the BLSTM principle into a Tandem front-end for probabilistic feature extraction. Unlike previously proposed approaches which exploit BLSTM modeling by generating a discrete phoneme prediction feature, our feature extractor merges continuous high-level probabilistic BLSTM features with low-level features. Evaluations on challenging spontaneous, conversational speech recognition tasks show that this concept prevails over recently published architectures for feature-level context modeling.