Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks

Authors:
Martin Wöllmer;Björn Schuller
Affiliations:
-;-
Venue:
Neurocomputing
Year:
2014

Citing 13
Cited 0

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Neural Networks - 2005 Special issue: IJCNN 2005
Learning to Forget: Continual Prediction with LSTM

Neural Computation
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

ICML '06 Proceedings of the 23rd international conference on Machine learning
Long Short-Term Memory

Neural Computation
Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement

EURASIP Journal on Audio, Speech, and Music Processing
A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams

Neurocomputing
An application of recurrent neural networks to discriminative keyword spotting

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Opensmile: the munich versatile and fast open-source audio feature extractor

Proceedings of the international conference on Multimedia
Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space

IEEE Transactions on Affective Computing
The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments

Computer Speech and Language
Tandem connectionist feature extraction for conversational speech recognition

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Bidirectional recurrent neural networks

IEEE Transactions on Signal Processing
Online Driver Distraction Detection Using Long Short-Term Memory

IEEE Transactions on Intelligent Transportation Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

We introduce a novel context-sensitive feature extraction approach for spontaneous speech recognition. As bidirectional Long Short-Term Memory (BLSTM) networks are known to enable improved phoneme recognition accuracies by incorporating long-range contextual information into speech decoding, we integrate the BLSTM principle into a Tandem front-end for probabilistic feature extraction. Unlike the previously proposed approaches which exploit BLSTM modeling by generating a discrete phoneme prediction feature, our feature extractor merges continuous high-level probabilistic BLSTM features with low-level features. By combining BLSTM modeling and Bottleneck (BN) feature generation, we propose a novel front-end that allows us to produce context-sensitive probabilistic feature vectors of arbitrary size, independent of the network training targets. Evaluations on challenging spontaneous, conversational speech recognition tasks show that this concept prevails over recently published architectures for feature-level context modeling.