Effects of out of vocabulary words in spoken document retrieval (poster session)

  • Authors:
  • P. C. Woodland;S. E. Johnson;P. Jourlin;K. Spärck Jones

  • Affiliations:
  • Cambridge University, Engineering Department, Trumpington Street, Cambridge CB2 1PZ, UK;Cambridge University, Engineering Department, Trumpington Street, Cambridge CB2 1PZ, UK;Cambridge University, Computer Laboratory, Pembroke Street, Cambridge, CB2 3QG, UK;Cambridge University, Computer Laboratory, Pembroke Street, Cambridge, CB2 3QG, UK

  • Venue:
  • SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The effects of out-of-vocabulary (OOV) items in spoken document retrieval (SDR) are investigated. Several sets of transcriptions were created for the TREC-8 SDR task using a speech recognition system varying the vocabulary sizes and OOV rates, and the relative retrieval performance measured. The effects of OOV terms on a simple baseline IR system and on more sophisticated retrieval systems are described. The use of a parallel corpus for query and document expansion is found to be especially beneficial, and with this data set, good retrieval performance can be achieved even for fairly high OOV rates.