Mixing and Merging for Spoken Document Retrieval

Authors:
Mark Sanderson;Fabio Crestani
Affiliations:
-;-
Venue:
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Year:
1998

Citing 6
Cited 3

Ranking algorithms

Information retrieval
Results of applying probabilistic IR to OCR text

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Length Normalization in Degraded Text Collections

Length Normalization in Degraded Text Collections
Video mail retrieval using voice: an overview of the stage 2 system

MIRO'95 Proceedings of the Final conference on Multimedia Information Retrieval
Speech retrieval based on automatic indexing

MIRO'95 Proceedings of the Final conference on Multimedia Information Retrieval

Speech and Hand Transcribed Retrieval

Information Retrieval Techniques for Speech Applications [this book is based on the workshop “Information Retrieval Techniques for Speech Applications”, held as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in New Orleans, USA, in September 2001].
An Investigation of Mixed-Media Information Retrieval

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a number of experiments that explored the issues surrounding the retrieval of spoken documents. Two such issues were examined. First, attempting to find the best use of speech recogniser output to produce the highest retrieval effectiveness. Second, investigating the potential problems of retrieving from a so-called "mixed collection", i.e. one that contains documents from both a speech recognition system (producing many errors) and from hand transcription (producing presumably near perfect documents). The result of the first part of the work found that merging the transcripts of multiple recognisers showed most promise. The investigation in the second part showed how the term weighting scheme used in a retrieval system was important in determining whether the system was affected detrimentally when retrieving from a mixed collection.