Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Building an information retrieval test collection for spontaneous conversational speech
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the CLEF-2005 cross-language speech retrieval track
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Spoken Document Retrieval Based on Approximated Sequence Alignment
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Hi-index | 0.00 |
Cross-language retrieval of spontaneous speech combines the challenges of working with noisy automated transcription and language translation. The CLEF 2005 Cross-Language Speech Retrieval (CL-SR) task provides a standard test collection to investigate these challenges. We show that we can improve retrieval performance: by careful selection of the term weighting scheme; by decomposing automated transcripts into phonetic substrings to help ameliorate transcription errors; and by combining automatic transcriptions with manually-assigned metadata. We further show that topic translation with online machine translation resources yields effective CL-SR.