On improving pseudo-relevance feedback using pseudo-irrelevant documents

Authors:
Karthik Raman;Raghavendra Udupa;Pushpak Bhattacharya;Abhijit Bhole
Affiliations:
Indian Institute of Technology Bombay, Mumbai;Microsoft Research India, Bangalore;Indian Institute of Technology Bombay, Mumbai;Microsoft Research India, Bangalore
Venue:
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Year:
2010

Citing 4
Cited 2

Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval

Improving retrieval accuracy of difficult queries through generalizing negative document language models

Proceedings of the 20th ACM international conference on Information and knowledge management
Proximity-based rocchio's model for pseudo relevance

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

Pseudo-Relevance Feedback (PRF) assumes that the top-ranking n documents of the initial retrieval are relevant and extracts expansion terms from them. In this work, we introduce the notion of pseudo-irrelevant documents, i.e. high-scoring documents outside of top n that are highly unlikely to be relevant. We show how pseudo-irrelevant documents can be used to extract better expansion terms from the top-ranking n documents: good expansion terms are those which discriminate the top-ranking n documents from the pseudo-irrelevant documents. Our approach gives substantial improvements in retrieval performance over Model-based Feedback on several test collections.