Point process models for spotting keywords in continuous speech

  • Authors:
  • Aren Jansen;Partha Niyogi

  • Affiliations:
  • Department of Computer Science, The University of Chicago, Chicago, IL;Department of Computer Science, The University of Chicago, Chicago, IL

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the hypothesis that the linguistic content underlying human speech may be coded in the pattern of timings of various acoustic "events" (landmarks) in the speech signal. This hypothesis is supported by several strands of research in the fields of linguistics, speech perception, and neuroscience. In this paper, we put these scientific motivations to the test by formulating a point process-based computational framework for the task of spotting keywords in continuous speech. We find that even with a noisy and extremely sparse phonetic landmark-based point process representation, keywords can be spotted with accuracy levels comparable to recently studied hidden Markov model-based keyword spotting systems. We show that the performance of our keyword spotting system in the high-precision regime is better predicted by the median duration of the keyword rather than simply the number of its constituent syllables or phonemes. When we are confronted with very few (in the extreme case, zero) examples of the keyword in question, we find that constructing a keyword detector from its component syllable detectors provides a viable approach.