Reliable information retrieval evaluation with incomplete and biased judgements

Authors:
Stefan Büttcher;Charles L. A. Clarke;Peter C. K. Yeung;Ian Soboroff
Affiliations:
University of Waterloo;University of Waterloo;University of Waterloo;National Institute of Standards and Technology
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 12
Cited 19

Support-Vector Networks

Machine Learning
The Cranfield tests on index language devices

Readings in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Inferring document relevance via average precision

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Retrieval evaluation with incomplete relevance data: a comparative study of three measures

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management

Fast learning of document ranking functions with the committee perceptron

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
On information retrieval metrics designed for evaluation with incomplete relevance assessments

Information Retrieval
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Comparing metrics across TREC and NTCIR: the robustness to system bias

Proceedings of the 17th ACM conference on Information and knowledge management
Building a framework for the probability ranking principle by a family of expected weighted rank

ACM Transactions on Information Systems (TOIS)
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Weighted Rank Correlation in Information Retrieval Evaluation

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Finding `Lucy in Disguise': The Misheard Lyric Matching Problem

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Measuring the reusability of test collections

Proceedings of the third ACM international conference on Web search and data mining
Towards better evaluation for human language technology

LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Reusable test collections through experimental design

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On identifying representative relevant documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Evaluation effort, reliability and reusability in XML retrieval

Journal of the American Society for Information Science and Technology
Evaluation of information retrieval for E-discovery

Artificial Intelligence and Law
Pseudo test collections for learning web search ranking functions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Building a web test collection using social media

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Choices in batch information retrieval evaluation

Proceedings of the 18th Australasian Document Computing Symposium
Towards information retrieval evaluation with reduced and only positive judgements

Proceedings of the 18th Australasian Document Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information retrieval evaluation based on the pooling method is inherently biased against systems that did not contribute to the pool of judged documents. This may distort the results obtained about the relative quality of the systems evaluated and thus lead to incorrect conclusions about the performance of a particular ranking technique. We examine the magnitude of this effect and explore how it can be countered by automatically building an unbiased set of judgements from the original, biased judgements obtained through pooling. We compare the performance of this method with other approaches to the problem of incomplete judgements, such as bpref, and show that the proposed method leads to higher evaluation accuracy, especially if the set of manual judgements is rich in documents, but highly biased against some systems.