Prioritizing relevance judgments to improve the construction of IR test collections

Authors:
Mehdi Hosseini;Ingemar J. Cox;Natasa Milic-Frayling;Trevor Sweeting;Vishwa Vinay
Affiliations:
University College London, London, United Kingdom;University College London, London, United Kingdom;Microsoft Research Cambridge, Cambridge, United Kingdom;University College London, London, United Kingdom;Microsoft Research Cambridge, Cambridge, United Kingdom
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 13
Cited 4

Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Hypothesis testing with incomplete relevance judgments

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A simple and efficient sampling method for estimating AP and NDCG

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Score adjustment for correction of pooling bias

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A few good topics: Experiments in topic set reduction for retrieval evaluation

ACM Transactions on Information Systems (TOIS)
Measuring the reusability of test collections

Proceedings of the third ACM international conference on Web search and data mining
Reusable test collections through experimental design

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Selecting a subset of queries for acquisition of further relevance judgements

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory

Optimizing the cost of information retrieval testcollections

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
An uncertainty-aware query selection model for evaluation of IR systems

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On Using Fewer Topics in Information Retrieval Evaluations

Proceedings of the 2013 Conference on the Theory of Information Retrieval
A new statistical strategy for pooling: ELI

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of optimally allocating a fixed budget to construct a test collection with associated relevance judgements, such that it can (i) accurately evaluate the relative performance of the participating systems, and (ii) generalize to new, previously unseen systems. We propose a two stage approach. For a given set of queries, we adopt the traditional pooling method and use a portion of the budget to evaluate a set of documents retrieved by the participating systems. Next, we analyze the relevance judgments to prioritize the queries and remaining pooled documents for further relevance assessments. The query prioritization is formulated as a convex optimization problem, thereby permitting efficient solution and providing a flexible framework to incorporate various constraints. Query-document pairs with the highest priority scores are evaluated using the remaining budget. We evaluate our resource optimization approach on the TREC 2004 Robust track collection. We demonstrate that our optimization techniques are cost efficient and yield a significant improvement in the reusability of the test collections.