A probabilistic model of information retrieval: development and comparative experiments
Information Processing and Management: an International Journal
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Subword-based approaches for spoken document retrieval
Subword-based approaches for spoken document retrieval
Content-based multimedia information retrieval: State of the art and challenges
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Regularized estimation of mixture models for robust pseudo-relevance feedback
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Position specific posterior lattices for indexing speech
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Soft indexing of speech content for search in spoken documents
Computer Speech and Language
Latent concept expansion using markov random fields
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Indexing confusion networks for morph-based spoken document retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Statistical Language Models for Information Retrieval A Critical Review
Foundations and Trends in Information Retrieval
Spoken information retrieval for turkish broadcast news
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An audio indexing system for election video material
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
General indexation of weighted automata: application to spoken utterance retrieval
SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004
A comparative study of methods for estimating query language models with pseudo feedback
Proceedings of the 18th ACM conference on Information and knowledge management
Statistical lattice-based spoken document retrieval
ACM Transactions on Information Systems (TOIS)
Latent semantic indexing (LSI) fails for TREC collections
ACM SIGKDD Explorations Newsletter
Regularized latent semantic indexing
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Approaches to reduce the effects of OOV queries on indexed spoken audio
IEEE Transactions on Multimedia
Interactive Spoken Document Retrieval With Suggested Key Terms Ranked by a Markov Decision Process
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
In a text context, document/query expansion has proven very useful in retrieving objects semantically related to the query. However, when applying text-based techniques on spoken content, the inevitable recognition errors seriously degrade performance even when the retrieval process is performed over lattices. We propose the estimation of more accurate term distributions (or unigram language models) for the spoken documents by acoustic similarity graphs. In this approach, a graph is constructed for each term describing the acoustic similarity among all signal regions hypothesized to be the considered term. Score propagation based on a random walk over the graph offers more reliable scores of the term hypotheses, which in turn yield more accurate term distributions (or unigram language models). This approach was applied with the language modeling retrieval approach, including using document expansion based on latent topic analysis and query expansion with a query-regularized mixture model. We extend these approaches from words to subword n-grams, and the query expansion from document-level to utterance-level and from term-based to topic-based. Experiments performed on Mandarin broadcast news showed improved performance under almost all tested conditions.