Predicting search term reliability for spoken term detection systems

  • Authors:
  • Amir Hossein Torbati;Joseph Picone

  • Affiliations:
  • Department of Electrical and Computer Engineering, Temple University, Philadelphia, USA 19027;Department of Electrical and Computer Engineering, Temple University, Philadelphia, USA 19027

  • Venue:
  • International Journal of Speech Technology
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spoken term detection is an extension of text-based searching that allows users to type keywords and search audio files containing recordings of spoken language. Performance is dependent on many external factors such as the acoustic channel, language, pronunciation variations and acoustic confusability of the search term. Unlike text-based searches, the likelihoods of false alarms and misses for specific search terms, which we refer to as reliability, play a significant role in the overall perception of the usability of the system. In this paper, we present a system that predicts the reliability of a search term based on its inherent confusability. Our approach integrates predictors of the reliability that are based on both acoustic and phonetic features. These predictors are trained using an analysis of recognition errors produced from a state of the art spoken term detection system operating on the Fisher Corpus. This work represents the first large-scale attempt to predict the success of a keyword search term from only its spelling. We explore the complex relationship between phonetic and acoustic properties of search terms. We show that a 76 % correlation between the predicted error rate and the actual measured error rate can be achieved, and that the remaining confusability is due to other acoustic modeling issues that cannot be derived from a search term's spelling.