Adaptive Sparseness for Supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds
IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Recognition with Features Inspired by Visual Cortex
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Signal reconstruction in sensor arrays using sparse representations
Signal Processing - Sparse approximations in signal and image processing
Object Class Recognition and Localization Using Sparse Features with Limited Receptive Fields
International Journal of Computer Vision
Robust Face Recognition via Sparse Representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Point process models for event-based speech recognition
Speech Communication
An application of recurrent neural networks to discriminative keyword spotting
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Phoneme based acoustics keyword spotting in informal continuous speech
TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Spoken keyword detection using autoassociative neural networks
International Journal of Speech Technology
Hi-index | 0.00 |
We investigate the hypothesis that the linguistic content underlying human speech may be coded in the pattern of timings of various acoustic "events" (landmarks) in the speech signal. This hypothesis is supported by several strands of research in the fields of linguistics, speech perception, and neuroscience. In this paper, we put these scientific motivations to the test by formulating a point process-based computational framework for the task of spotting keywords in continuous speech. We find that even with a noisy and extremely sparse phonetic landmark-based point process representation, keywords can be spotted with accuracy levels comparable to recently studied hidden Markov model-based keyword spotting systems. We show that the performance of our keyword spotting system in the high-precision regime is better predicted by the median duration of the keyword rather than simply the number of its constituent syllables or phonemes. When we are confronted with very few (in the extreme case, zero) examples of the keyword in question, we find that constructing a keyword detector from its component syllable detectors provides a viable approach.