Spoken Document Retrieval Based on Approximated Sequence Alignment

Authors:
Pere R. Comas;Jordi Turmo
Affiliations:
TALP Research Center, Technical University of Catalonia (UPC),;TALP Research Center, Technical University of Catalonia (UPC),
Venue:
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Year:
2008

Citing 12
Cited 3

Automatic text processing

Automatic text processing
Phonetic confusion matrix based spoken document retrieval

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval
High-performance, open-domain question answering from large text collections

High-performance, open-domain question answering from large text collections
Algorithms for language reconstruction

Algorithms for language reconstruction
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Investigating cross-language speech retrieval for a spontaneous conversational speech collection

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Using various indexing schemes and multiple translations in the CL-SR task at CLEF 2005

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
CLEF-2005 CL-SR at maryland: document and query expansion using side collections and thesauri

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Overview of the CLEF-2006 cross-language speech retrieval track

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Dublin City University at CLEF 2006: cross-language speech retrieval (CL-SR) experiments

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Robust question answering for speech transcripts: UPC experience in QAst 2009

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Combining word and phonetic-code representations for spoken document retrieval

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Sibyl, a factoid question-answering system for spoken documents

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new approach to spoken document information retrieval for spontaneous speech corpora. The classical approach to this problem is the use of an automatic speech recognizer (ASR) combined with standard information retrieval techniques. However, ASRs tend to produce transcripts of spontaneous speech with significant word error rate, which is a drawback for standard retrieval techniques. To overcome such a limitation, our method is based on an approximated sequence alignment algorithm to search "sounds like" sequences. Our approach does not depend on extra information from the ASR and outperforms up to 7 points the precision of state-of-the-art techniques in our experiments.