Discriminative keyword spotting

Authors:
Joseph Keshet;David Grangier;Samy Bengio
Affiliations:
IDIAP Research Institute, Rue Marconi 19, CH-1920 Martigny, Switzerland;NEC Labs America, 4 Independence Way, Princeton, NJ 08540, United States;Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States
Venue:
Speech Communication
Year:
2009

Citing 7
Cited 4

Fundamentals of speech recognition

Fundamentals of speech recognition
Support-Vector Networks

Machine Learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A Segment-Based Wordspotter Using Phonetic Filler Models

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
HTIMIT and LLHDB: Speech Corpora for the Study of Handset Transducer Effects

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
An online algorithm for hierarchical phoneme classification

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
On the generalization ability of on-line learning algorithms

IEEE Transactions on Information Theory

Query-driven strategy for on-the-fly term spotting in spontaneous speech

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on scalable audio-content analysis
Keyword spotting exploiting Long Short-Term Memory

Speech Communication
Predicting human strategic decisions using facial expressions

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Extension of a Kernel-Based Classifier for Discriminative Spoken Keyword Spotting

Neural Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new approach for keyword spotting, which is based on large margin and kernel methods rather than on HMMs. Unlike previous approaches, the proposed method employs a discriminative learning procedure, in which the learning phase aims at achieving a high area under the ROC curve, as this quantity is the most common measure to evaluate keyword spotters. The keyword spotter we devise is based on mapping the input acoustic representation of the speech utterance along with the target keyword into a vector-space. Building on techniques used for large margin and kernel methods for predicting whole sequences, our keyword spotter distills to a classifier in this vector-space, which separates speech utterances in which the keyword is uttered from speech utterances in which the keyword is not uttered. We describe a simple iterative algorithm for training the keyword spotter and discuss its formal properties, showing theoretically that it attains high area under the ROC curve. Experiments on read speech with the TIMIT corpus show that the resulted discriminative system outperforms the conventional context-independent HMM-based system. Further experiments using the TIMIT trained model, but tested on both read (HTIMIT, WSJ) and spontaneous speech (OGI Stories), show that without further training or adaptation to the new corpus our discriminative system outperforms the conventional context-independent HMM-based system.