The significance of the Cranfield tests on index languages
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
The pragmatics of information retrieval experimentation, revisited
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Presenting results of experimental retrieval comparisons
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
The Philosophy of Information Retrieval Evaluation
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval evaluation with incomplete relevance data: a comparative study of three measures
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias
Proceedings of the 17th ACM conference on Information and knowledge management
Psychiatric document retrieval using a discourse-aware model
Artificial Intelligence
Evaluation effort, reliability and reusability in XML retrieval
Journal of the American Society for Information Science and Technology
Similar researcher search in academic environments
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Hi-index | 0.00 |
This paper investigates two relatively new measures of retrieval effectiveness in relation to the problem of incomplete relevance data. The measures, Bpref and RankEff, which do not take into account documents that have not been relevance judged, are compared theoretically and experimentally. The experimental comparisons involve a third measure, the well-known mean uninterpolated average precision. The results indicate that RankEff is the most stable of the three measures when the amount of relevance data is reduced, with respect to system ranking and absolute values. In addition, RankEff has the lowest error-rate.