Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio

  • Authors:
  • Beth Logan;Pedro Moreno;Om Deshmukh

  • Affiliations:
  • One Cambridge Center, Cambridge, Massachusetts, United States;One Cambridge Center, Cambridge, Massachusetts, United States;One Cambridge Center, Cambridge, Massachusetts, United States

  • Venue:
  • HLT '02 Proceedings of the second international conference on Human Language Technology Research
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We explore the problem of out of vocabulary (OOV) queries in audio indexing systems by comparing three indexing methods on a broadcast news repository containing 75 hours of audio. Our systems are word-based, phoneme-based and a novel system based on syllable-like units called particles. To better examine the performance of these three approaches we use a query set where the percentage of OOVs has been artificially increased to 50%. We additionally investigate whether the combination of the three indexing techniques can yield improvements in retrieval. We explore several simple combination strategies such as weighted combinations. We find that combining word and sub-word based systems results in improved retrieval performance.