A Latent Dirichlet Framework for Relevance Modeling

Authors:
Viet Ha-Thuc;Padmini Srinivasan
Affiliations:
Computer Science Department, The University of Iowa, Iowa City, USA 52246;Computer Science Department, The University of Iowa, Iowa City, USA 52246
Venue:
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Year:
2009

Citing 8
Cited 0

Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and redundancy detection in adaptive filtering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Passage retrieval based on language models

Proceedings of the eleventh international conference on Information and knowledge management
Latent dirichlet allocation

The Journal of Machine Learning Research
Parsimonious language models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models for discovering e-communities

Proceedings of the 15th international conference on World Wide Web
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. This could limit model robustness and effectiveness. In this study, we propose a Latent Dirichlet relevance model, which relaxes this assumption. Our approach derives from current research on Latent Dirichlet Allocation (LDA) topic models. LDA has been extensively explored, especially for discovering a set of topics from a corpus. LDA itself, however, has a limitation that is also addressed in our work. Topics generated by LDA from a corpus are synthetic, i.e., they do not necessarily correspond to topics identified by humans for the same corpus. In contrast, our model explicitly considers the relevance relationships between documents and given topics (queries). Thus unlike standard LDA, our model is directly applicable to goals such as relevance feedback for query modification and text classification, where topics (classes and queries) are provided upfront. Thus although the focus of our paper is on improving relevance-based language models, in effect our approach bridges relevance-based language models and LDA addressing limitations of both.