21st Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval
Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A unified model for metasearch, pooling, and system evaluation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The maximum entropy method for analyzing retrieval measures
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Inferring document relevance via average precision
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias
Proceedings of the 17th ACM conference on Information and knowledge management
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Measuring the reusability of test collections
Proceedings of the third ACM international conference on Web search and data mining
Extended expectation maximization for inferring score distributions
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Recent work has shown that average precision can be accurately estimated from a small random sample of judged documents. Unfortunately, such "random pools" cannot be used to evaluate retrieval measures in any standard way. In this work, we show that given such estimates of average precision, one can accurately infer the relevances of the remaining unjudged documents, thus obtaining a fully judged pool that can be used in standard ways for system evaluation of all kinds. Using TREC data, we demonstrate that our inferred judged pools are well correlated with assessor judgments, and we further demonstrate that our inferred pools can be used to accurately infer precision recall curves and all commonly used measures of retrieval performance.