Improving keyword spotting with a tandem BLSTM-DBN architecture

Authors:
Martin Wöllmer;Florian Eyben;Alex Graves;Björn Schuller;Gerhard Rigoll
Affiliations:
Institute for Human-Machine Communication, Technische Universität München, Germany;Institute for Human-Machine Communication, Technische Universität München, Germany;Institute for Computer Science VI, Technische Universität München, Germany;Institute for Human-Machine Communication, Technische Universität München, Germany;Institute for Human-Machine Communication, Technische Universität München, Germany
Venue:
NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
Year:
2009

Citing 5
Cited 0

Long Short-Term Memory

Neural Computation
Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An application of recurrent neural networks to discriminative keyword spotting

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Bidirectional LSTM networks for improved phoneme classification and recognition

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Bidirectional recurrent neural networks

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel architecture for keyword spotting which is composed of a Dynamic Bayesian Network (DBN) and a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net. The DBN uses a hidden garbage variable as well as the concept of switching parents to discriminate between keywords and arbitrary speech. Contextual information is incorporated by a BLSTM network, providing a discrete phoneme prediction feature for the DBN. Together with continuous acoustic features, the discrete BLSTM output is processed by the DBN which detects keywords. Due to the flexible design of our Tandem BLSTM-DBN recognizer, new keywords can be added to the vocabulary without having to re-train the model. Further, our concept does not require the training of an explicit garbage model. Experiments on the TIMIT corpus show that incorporating a BLSTM network into the DBN architecture can increase true positive rates by up to 10%.