Continuously variable duration hidden Markov models for automatic speech recognition
Computer Speech and Language
Sparse Representations for Image Decompositions
International Journal of Computer Vision
What Can a Neuron Learn with Spike-Timing-Dependent Plasticity?
Neural Computation
Robust Object Recognition with Cortex-Like Mechanisms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature-based pronunciation modeling for speech recognition
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Point process models for spotting keywords in continuous speech
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
Several strands of research in the fields of linguistics, speech perception, and neuroethology suggest that modelling the temporal dynamics of an acoustic event landmark-based representation is a scientifically plausible approach to the automatic speech recognition (ASR) problem. Adopting a point process representation of the speech signal opens up ASR to a large class of statistical models that have seen wide application in the neuroscience community. In this paper, we formulate several point process models for application to speech recognition, designed to operate on sparse detector-based representations of the speech signal. We find that even with a noisy and extremely sparse phone-based point process representation, obstruent phones can be decoded at accuracy levels comparable to a basic hidden Markov model baseline and with improved robustness. We conclude by outlining various avenues for future development of our methodology.