Point process models for event-based speech recognition

Authors:
Aren Jansen;Partha Niyogi
Affiliations:
University of Chicago, Department of Computer Science, 1100 E 58th Street, Chicago, IL 60637, United States;University of Chicago, Department of Computer Science, 1100 E 58th Street, Chicago, IL 60637, United States
Venue:
Speech Communication
Year:
2009

Citing 5
Cited 1

Continuously variable duration hidden Markov models for automatic speech recognition

Computer Speech and Language
Sparse Representations for Image Decompositions

International Journal of Computer Vision
What Can a Neuron Learn with Spike-Timing-Dependent Plasticity?

Neural Computation
Robust Object Recognition with Cortex-Like Mechanisms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature-based pronunciation modeling for speech recognition

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Point process models for spotting keywords in continuous speech

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several strands of research in the fields of linguistics, speech perception, and neuroethology suggest that modelling the temporal dynamics of an acoustic event landmark-based representation is a scientifically plausible approach to the automatic speech recognition (ASR) problem. Adopting a point process representation of the speech signal opens up ASR to a large class of statistical models that have seen wide application in the neuroscience community. In this paper, we formulate several point process models for application to speech recognition, designed to operate on sparse detector-based representations of the speech signal. We find that even with a noisy and extremely sparse phone-based point process representation, obstruent phones can be decoded at accuracy levels comparable to a basic hidden Markov model baseline and with improved robustness. We conclude by outlining various avenues for future development of our methodology.