Approximating true relevance distribution from a mixture model based on irrelevance data

Authors:
Peng Zhang;Yuexian Hou;Dawei Song
Affiliations:
The Robert Gordon University, Aberdeen, United Kingdom;Tianjin University, Tianjin, China;The Robert Gordon University, Aberdeen, United Kingdom
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 13
Cited 3

Optimization of relevance feedback weights

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of accessing nonmatching documents on relevance feedback

ACM Transactions on Information Systems (TOIS)
Learning routing queries in a query zone

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Challenges in web search engines

ACM SIGIR Forum
SIGIR 2003 workshop report: implicit measures of user interests and preferences

ACM SIGIR Forum
A study of factors affecting the utility of implicit relevance feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improve retrieval accuracy for difficult queries using negative feedback

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A study of methods for negative relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A cluster-based resampling method for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Negative feedback: the forsaken nature available for re-ranking

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Bias-variance decomposition of ir evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Bias-variance analysis in estimating true query model for information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pseudo relevance feedback (PRF), which has been widely applied in IR, aims to derive a distribution from the top n pseudo relevant documents D. However, these documents are often a mixture of relevant and irrelevant documents. As a result, the derived distribution is actually a mixture model, which has long been limiting the performance of PRF. This is particularly the case when we deal with difficult queries where the truly relevant documents in D are very sparse. In this situation, it is often easier to identify a small number of seed irrelevant documents, which can form a seed irrelevant distribution. Then, a fundamental and challenging problem arises: solely based on the mixed distribution and a seed irrelevance distribution, how to automatically generate an optimal approximation of the true relevance distribution? In this paper, we propose a novel distribution separation model (DSM) to tackle this problem. Theoretical justifications of the proposed algorithm are given. Evaluation results from our extensive simulated experiments on several large scale TREC data sets demonstrate the effectiveness of our method, which outperforms a well respected PRF Model, the Relevance Model (RM), as well as the use of RM on D with the seed negative documents directly removed.