Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Hypothesis testing with incomplete relevance judgments
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Semiautomatic evaluation of retrieval systems using document similarities
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
On rank correlation and the distance between rankings
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Towards methods for the collective gathering and quality control of relevance assessments
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Multiple testing in statistical analysis of systems-based information retrieval experiments
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We define a method to estimate the random and systematic errors resulting from incomplete relevance assessments.Mean Average Precision (MAP) computed over a large number of topics with a shallow assessment pool substantially outperforms -- for the same adjudication effort MAP computed over fewer topics with deeper pools, and P@k computed with pools of the same depth. Move-to-front pooling,previously reported to yield substantially better rank correlation, yields similar power, and lower bias, compared tofixed-depth pooling.