A Theoretical Analysis of Pseudo-Relevance Feedback Models

Authors:
Stéphane Clinchant;Eric Gaussier
Affiliations:
Xerox Research Center Europe;LIG, Univ. Grenoble I
Venue:
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Year:
2013

Citing 21
Cited 0

On term selection for query expansion

Journal of Documentation
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
A formal study of information retrieval heuristics

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution

ICML '06 Proceedings of the 23rd international conference on Machine learning
Semantic term matching in axiomatic approaches to information retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Regularized estimation of mixture models for robust pseudo-relevance feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimation and use of uncertainty in pseudo-relevance feedback

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting underrepresented query aspects for automatic query expansion

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A cluster-based resampling method for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A new probabilistic retrieval model based on the dirichlet compound multinomial distribution

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions

Artificial Intelligence Review
Adaptive relevance feedback in information retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Reducing the risk of query expansion via robust constrained optimization

Proceedings of the 18th ACM conference on Information and knowledge management
A comparative study of methods for estimating query language models with pseudo feedback

Proceedings of the 18th ACM conference on Information and knowledge management
Information-based models for ad hoc IR

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Geometric representations for multiple documents

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Positional relevance model for pseudo-relevance feedback

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A unified optimization framework for robust pseudo-relevance feedback algorithms

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Is document frequency important for PRF?

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our goal in this study is to compare several widely used pseudo-relevance feedback (PRF) models and understand what explains their respective behavior. To do so, we first analyze how different PRF models behave through the characteristics of the terms they select and through their performance on two widely used test collections. This analysis reveals that several well-known models surprisingly tend to select very common terms, with low IDF (inverse document frequency). We then introduce several conditions PRF models should satisfy regarding both the terms they select and the way they weigh them, prior to study whether standard PRF models satisfy these conditions or not. This study reveals that most models are deficient with respect to at least one condition, and that this deficiency explains the results of our analysis of the behavior of the models, as well as some of the results reported on the respective performance of PRF models. Based on the PRF conditions, we finally propose possible corrections for the simple mixture model. The PRF models obtained after these corrections outperform their standard version and yield state-of-the-art PRF models which confirms the validity of our theoretical analysis.