A simple and efficient sampling method for estimating AP and NDCG

Authors:
Emine Yilmaz;Evangelos Kanoulas;Javed A. Aslam
Affiliations:
Microsoft Research, Cambridge, United Kngdm;Northeastern University, Boston, MA, USA;Northeastern University, Boston, MA, USA and Microsoft Research, Cambridge, United Kngdm
Venue:
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2008

Citing 11
Cited 34

The Cranfield tests on index language devices

Readings in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Alternatives to Bpref

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
On the robustness of relevance measures with incomplete judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Strategic system comparisons via targeted relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of pooled and sampled relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Relevance assessment: are judges exchangeable and does it matter

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Score adjustment for correction of pooling bias

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Towards methods for the collective gathering and quality control of relevance assessments

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
TREC-CHEM: large scale chemical information retrieval evaluation at TREC

ACM SIGIR Forum
On statistical analysis and optimization of information retrieval effectiveness metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Report on INEX 2009

ACM SIGIR Forum
Online stratified sampling: evaluating classifiers at web-scale

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Aspects and analysis of patent test collections

PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
Why finding entities in Wikipedia is difficult, sometimes

Information Retrieval
Overview of the INEX 2009 entity ranking track

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Crowdsourcing for search evaluation

ACM SIGIR Forum
Crowdsourcing for search and data mining

ACM SIGIR Forum
ReFER: effective relevance feedback for entity ranking

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Query modeling for entity search based on terms, categories, and examples

ACM Transactions on Information Systems (TOIS)
Prioritizing relevance judgments to improve the construction of IR test collections

Proceedings of the 20th ACM international conference on Information and knowledge management
Toward interactive training and evaluation

Proceedings of the 20th ACM international conference on Information and knowledge management
A fast MAP adaptation technique for gmm-supervector-based video semantic indexing systems

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Crowdsourcing for information retrieval

ACM SIGIR Forum
Category-based query modeling for entity search

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Cross-Language Latent Relational Search between Japanese and English Languages Using a Web Corpus

ACM Transactions on Asian Language Information Processing (TALIP)
A ranking framework for entity oriented search using Markov random fields

Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
Exploiting the category structure of Wikipedia for entity ranking

Artificial Intelligence
Crowdsourcing for information retrieval: introduction to the special issue

Information Retrieval
Large-scale visual concept detection with explicit kernel maps and power mean SVM

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
A mutual information-based framework for the analysis of information retrieval systems

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Inferring conceptual relationships to improve medical records search

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Learning to handle negated language in medical records search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The TREC Medical Records Track

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
A new statistical strategy for pooling: ELI

Information Processing Letters
Choices in batch information retrieval evaluation

Proceedings of the 18th Australasian Document Computing Symposium
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems
Retina enhanced SURF descriptors for spatio-temporal concept detection

Multimedia Tools and Applications
Semantic concept-enriched dependence model for medical information retrieval

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of large scale retrieval evaluation. Recently two methods based on random sampling were proposed as a solution to the extensive effort required to judge tens of thousands of documents. While the first method proposed by Aslam et al. [1] is quite accurate and efficient, it is overly complex, making it difficult to be used by the community, and while the second method proposed by Yilmaz et al., infAP [14], is relatively simple, it is less efficient than the former since it employs uniform random sampling from the set of complete judgments. Further, none of these methods provide confidence intervals on the estimated values. The contribution of this paper is threefold: (1) we derive confidence intervals for infAP, (2) we extend infAP to incorporate nonrandom relevance judgments by employing stratified random sampling, hence combining the efficiency of stratification with the simplicity of random sampling, (3) we describe how this approach can be utilized to estimate nDCG from incomplete judgments. We validate the proposed methods using TREC data and demonstrate that these new methods can be used to incorporate nonrandom samples, as were available in TREC Terabyte track '06.