Predicting document effectiveness in pseudo relevance feedback

  • Authors:
  • Mostafa Keikha;Jangwon Seo;W. Bruce Croft;Fabio Crestani

  • Affiliations:
  • Faculty of Informatics, Lugano, Switzerland;Center for Intelligent Information Retrieval, Amherst, MA, USA;Center for Intelligent Information Retrieval, Amherst, MA, USA;Faculty of Informatics, Lugano, Switzerland

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pseudo relevance feedback (PRF) is one of effective practices in Information Retrieval. In particular, PRF via the relevance model (RM) has been widely used due to the theoretical soundness and effectiveness. In a PRF scenario, an underlying relevance model is inferred by combining language models of the top retrieved documents where the contribution of each document is assumed to be proportional to its score for the initial query. However, it is not clear that selecting the top retrieved documents only by the initial retrieval scores is actually the optimal way for query expansion. We show that the initial score of a document is not a good indicator of its effectiveness in query expansion. Our experiments show that if we can estimate the true effectiveness of the top retrieved documents, we can obtain almost 50% improvement over RM. Based on this observation, we introduce various document features that can be used to estimate the effectiveness of documents. Our experiments on the TREC Robust collection show that the proposed features make good predictors, and PRF using the effectiveness predictors can achieve statistically significant improvements over RM.