A lattice-based approach to query-by-example spoken document retrieval

  • Authors:
  • Tee Kiah Chia;Khe Chai Sim;Haizhou Li;Hwee Tou Ng

  • Affiliations:
  • National University of Singpaore, Singapore, Singapore;Institute for Infocomm Research, Singapore, Singapore;Institute for Infocomm Research, Singapore, Singapore;National University of Singapore, Singapore, Singapore

  • Venue:
  • Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent efforts on the task of spoken document retrieval (SDR) have made use of speech lattices: speech lattices contain information about alternative speech transcription hypotheses other than the 1-best transcripts, and this information can improve retrieval accuracy by overcoming recognition errors present in the 1-best transcription. In this paper, we look at using lattices for the query-by-example spoken document retrieval task - retrieving documents from a speech corpus, where the queries are themselves in the form of complete spoken documents (query exemplars). We extend a previously proposed method for SDR with short queries to the query-by-example task. Specifically, we use a retrieval method based on statistical modeling: we compute expected word counts from document and query lattices, estimate statistical models from these counts, and compute relevance scores as divergences between these models. Experimental results on a speech corpus of conversational English show that the use of statistics from lattices for both documents and query exemplars results in better retrieval accuracy than using only 1-best transcripts for either documents, or queries, or both. In addition, we investigate the effect of stop word removal which further improves retrieval accuracy. To our knowledge, our work is the first to have used a lattice-based approach to query-by-example spoken document retrieval.