A statistical method for system evaluation using incomplete judgments
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Bias and the limits of pooling
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A simple and efficient sampling method for estimating AP and NDCG
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating new search engine configurations with pre-existing judgments and clicks
Proceedings of the 20th international conference on World wide web
Evaluation of information retrieval for E-discovery
Artificial Intelligence and Law
How fresh do you want your search results?
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Test collections are most useful when they are reusable, that is, when they can be reliably used to rank systems that did not contribute to the pools. Pooled relevance judgments for very large collections may not be reusable for two easons: they will be very sparse and not sufficiently complete, and they may be biased in the sense that theywill unfairly rank some class of systems. The TREC 2006 terabyte track judged both a pool and a deep random sample in order to measure the effects of sparseness and bias.