Keyword spotting exploiting Long Short-Term Memory

Authors:
Martin WöLlmer;BjöRn Schuller;Gerhard Rigoll
Affiliations:
Technische Universität München, Institute for Human-Machine Communication, Theresienstr. 90, 80333 München, Germany;Technische Universität München, Institute for Human-Machine Communication, Theresienstr. 90, 80333 München, Germany;Technische Universität München, Institute for Human-Machine Communication, Theresienstr. 90, 80333 München, Germany
Venue:
Speech Communication
Year:
2013

Citing 19
Cited 0

Introduction to Bayesian Networks

Introduction to Bayesian Networks
2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Neural Networks - 2005 Special issue: IJCNN 2005
Learning to Forget: Continual Prediction with LSTM

Neural Computation
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

ICML '06 Proceedings of the 23rd international conference on Machine learning
Long Short-Term Memory

Neural Computation
Discriminative keyword spotting

Speech Communication
Embodied conversational agents in computer assisted language learning

Speech Communication
Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams

Neurocomputing
Keyword spotting based system for conversation fostering in tabletop scenarios: preliminary evaluation

HSI'09 Proceedings of the 2nd conference on Human System Interactions
Affective interactive narrative in the CALLAS project

ICVS'07 Proceedings of the 4th international conference on Virtual storytelling: using virtual reality technologies for storytelling
An application of recurrent neural networks to discriminative keyword spotting

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
AVEC 2011-the first international audio/visual emotion challenge

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
An online algorithm for hierarchical phoneme classification

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Tandem connectionist feature extraction for conversational speech recognition

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Bidirectional recurrent neural networks

IEEE Transactions on Signal Processing
Online Driver Distraction Detection Using Long Short-Term Memory

IEEE Transactions on Intelligent Transportation Systems
Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Building Autonomous Sensitive Artificial Listeners

IEEE Transactions on Affective Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate various techniques for keyword spotting which are exclusively based on acoustic modeling and do not presume the existence of an in-domain language model. Since adequate context modeling is nevertheless necessary for word spotting, we show how the principle of Long Short-Term Memory (LSTM) can be incorporated into the decoding process. We propose a novel technique that exploits LSTM in combination with Connectionist Temporal Classification in order to improve performance by using a self-learned amount of contextual information. All considered approaches are evaluated on read speech as contained in the TIMIT corpus as well as on the SEMAINE database which consists of spontaneous and emotionally colored speech. As further evidence for the effectiveness of LSTM modeling for keyword spotting, results on the CHiME task are shown.