Predicting document effectiveness in pseudo relevance feedback

Authors:
Mostafa Keikha;Jangwon Seo;W. Bruce Croft;Fabio Crestani
Affiliations:
Faculty of Informatics, Lugano, Switzerland;Center for Intelligent Information Retrieval, Amherst, MA, USA;Center for Intelligent Information Retrieval, Amherst, MA, USA;Faculty of Informatics, Lugano, Switzerland
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 7
Cited 2

Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Stochastic gradient boosting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Parsimonious language models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Estimation and use of uncertainty in pseudo-relevance feedback

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A cluster-based resampling method for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Geometric representations for multiple documents

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Positional relevance model for pseudo-relevance feedback

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Learning-Based pseudo-relevance feedback for patent retrieval

IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
High performance query expansion using adaptive co-training

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pseudo relevance feedback (PRF) is one of effective practices in Information Retrieval. In particular, PRF via the relevance model (RM) has been widely used due to the theoretical soundness and effectiveness. In a PRF scenario, an underlying relevance model is inferred by combining language models of the top retrieved documents where the contribution of each document is assumed to be proportional to its score for the initial query. However, it is not clear that selecting the top retrieved documents only by the initial retrieval scores is actually the optimal way for query expansion. We show that the initial score of a document is not a good indicator of its effectiveness in query expansion. Our experiments show that if we can estimate the true effectiveness of the top retrieved documents, we can obtain almost 50% improvement over RM. Based on this observation, we introduce various document features that can be used to estimate the effectiveness of documents. Our experiments on the TREC Robust collection show that the proposed features make good predictors, and PRF using the effectiveness predictors can achieve statistically significant improvements over RM.