On bias problem in relevance feedback

Authors:
Qianli Xing;Yi Zhang;Lanbo Zhang
Affiliations:
Tsinghua University, Beijing, China;University of California, Santa Cruz, Santa Cruz, CA, USA;University of California, Santa Cruz, Santa Cruz, CA, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 6
Cited 0

Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Active feedback in ad hoc information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A cluster-based resampling method for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Incorporating diversity and density in active learning for relevance feedback

ECIR'07 Proceedings of the 29th European conference on IR research
On identifying representative relevant documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Relevance feedback is an effective approach to improve retrieval quality over the initial query. Typical relevance feedback methods usually select top-ranked documents for relevance judgments, then query expansion or model updating are carried out based on the feedback documents. However, the number of feedback documents is usually limited due to expensive human labeling. Thus relevant documents in the feedback set are hardly representative of all relevant documents and the feedback set is actually biased. As a result, the performance of relevance feedback will get hurt. In this paper, we first show how and where the bias problem exists through experiments. Then we study how the bias can be reduced by utilizing the unlabeled documents. After analyzing the usefulness of a document to relevance feedback, we propose an approach that extends the feedback set with carefully selected unlabeled documents by heuristics. Our experiment results show that the extended feedback set has less bias than the original feedback set and better performance can be achieved when the extended feedback set is used for relevance feedback.