Evaluation of retrieval effectiveness with incomplete relevance data: Theoretical and experimental comparison of three measures

Authors:
Per Ahlgren;Leif Grönqvist
Affiliations:
University College of Borås, Swedish School of Library and Information Science, Sweden;Växjö University, School of Mathematics and Systems Engineering, Sweden
Venue:
Information Processing and Management: an International Journal
Year:
2008

Citing 12
Cited 4

The significance of the Cranfield tests on index languages

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
The pragmatics of information retrieval experimentation, revisited

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Presenting results of experimental retrieval comparisons

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval evaluation with incomplete relevance data: a comparative study of three measures

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management

Comparing metrics across TREC and NTCIR: the robustness to system bias

Proceedings of the 17th ACM conference on Information and knowledge management
Psychiatric document retrieval using a discourse-aware model

Artificial Intelligence
Evaluation effort, reliability and reusability in XML retrieval

Journal of the American Society for Information Science and Technology
Similar researcher search in academic environments

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates two relatively new measures of retrieval effectiveness in relation to the problem of incomplete relevance data. The measures, Bpref and RankEff, which do not take into account documents that have not been relevance judged, are compared theoretically and experimentally. The experimental comparisons involve a third measure, the well-known mean uninterpolated average precision. The results indicate that RankEff is the most stable of the three measures when the amount of relevance data is reduced, with respect to system ranking and absolute values. In addition, RankEff has the lowest error-rate.