Estimating average precision with incomplete and imperfect judgments

Authors:
Emine Yilmaz;Javed A. Aslam
Affiliations:
Northeastern University, Boston, MA;Northeastern University, Boston, MA
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 15
Cited 66

Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
The Cranfield tests on index language devices

Readings in information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval

21st Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval
Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation by highly relevant documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On Collection Size and Retrieval Effectiveness

Information Retrieval
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic ranking of retrieval systems in imperfect environments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A unified model for metasearch, pooling, and system evaluation

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The maximum entropy method for analyzing retrieval measures

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Reliable information retrieval evaluation with incomplete and biased judgements

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Alternatives to Bpref

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
On the robustness of relevance measures with incomplete judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Strategic system comparisons via targeted relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of pooled and sampled relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Problems with Kendall's tau

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Inexpensive fusion methods for enhancing feature detection

Image Communication
Inferring document relevance from incomplete information

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Semiautomatic evaluation of retrieval systems using document similarities

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
On information retrieval metrics designed for evaluation with incomplete relevance assessments

Information Retrieval
Retrieval sensitivity under training using different measures

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A new rank correlation coefficient for information retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A simple and efficient sampling method for estimating AP and NDCG

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation over thousands of queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance assessment: are judges exchangeable and does it matter

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Comparing metrics across TREC and NTCIR:: the robustness to pool depth bias

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision when judgments are incomplete

Knowledge and Information Systems
Statistical power in retrieval experimentation

Proceedings of the 17th ACM conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias

Proceedings of the 17th ACM conference on Information and knowledge management
Multi-cue fusion for semantic video indexing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Survey and evaluation of query intent detection methods

Proceedings of the 2009 workshop on Web Search Click Data
Nullification test collections for web spam and SEO

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
If I Had a Million Queries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Measuring the Search Effectiveness of a Breadth-First Crawl

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Score adjustment for correction of pooling bias

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Document selection methodologies for efficient and effective learning-to-rank

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Including summaries in system evaluation

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improving Automatic Video Retrieval with Semantic Concept Detection

SCIA '09 Proceedings of the 16th Scandinavian Conference on Image Analysis
Building a framework for the probability ranking principle by a family of expected weighted rank

ACM Transactions on Information Systems (TOIS)
A few good topics: Experiments in topic set reduction for retrieval evaluation

ACM Transactions on Information Systems (TOIS)
Episode-constrained cross-validation in video concept retrieval

IEEE Transactions on Multimedia
Measuring the reusability of test collections

Proceedings of the third ACM international conference on Web search and data mining
Exploiting external knowledge to improve video retrieval

Proceedings of the international conference on Multimedia information retrieval
AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets

Proceedings of the international conference on Multimedia information retrieval
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Classifier fusion for SVM-based multimedia semantic indexing

ECIR'07 Proceedings of the 29th European conference on IR research
The importance of anchor text for ad hoc search revisited

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The effect of assessor error on IR system evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Extending average precision to graded relevance judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
PRES: a score metric for evaluating recall-oriented information retrieval applications

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Assessor error in stratified evaluation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Why finding entities in Wikipedia is difficult, sometimes

Information Retrieval
Refining video annotation by exploiting inter-shot context

Proceedings of the international conference on Multimedia
Examining the robustness of evaluation metrics for patent retrieval with incomplete relevance judgements

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Evaluation effort, reliability and reusability in XML retrieval

Journal of the American Society for Information Science and Technology
Evaluation of information retrieval for E-discovery

Artificial Intelligence and Law
Evaluating multi-query sessions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
The effects of choice in routing relevance judgments

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Prioritizing relevance judgments to improve the construction of IR test collections

Proceedings of the 20th ACM international conference on Information and knowledge management
Optimizing the cost of information retrieval testcollections

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Mining concept relationship in temporal context for effective video annotation

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Collaborative video reindexing via matrix factorization

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Visual vocabulary optimization with spatial context for image annotation and classification

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
On smoothing average precision

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Large vocabulary quantization for searching instances from videos

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Combining inverted indices and structured search for ad-hoc object retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An uncertainty-aware query selection model for evaluation of IR systems

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Temporal-Spatial refinements for video concept fusion

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
A mutual information-based framework for the analysis of information retrieval systems

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
On Using Fewer Topics in Information Retrieval Evaluations

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Query-Performance Prediction Using Minimal Relevance Feedback

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Document Score Distribution Models for Query Performance Inference and Prediction

ACM Transactions on Information Systems (TOIS)
Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset

Multimedia Tools and Applications
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems
Retina enhanced SURF descriptors for spatio-temporal concept detection

Multimedia Tools and Applications
Semantic concept-enriched dependence model for medical information retrieval

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of evaluating retrieval systems using incomplete judgment information. Buckley and Voorhees recently demonstrated that retrieval systems can be efficiently and effectively evaluated using incomplete judgments via the bpref measure [6]. When relevance judgments are complete, the value of bpref is an approximation to the value of average precision using complete judgments. However, when relevance judgments are incomplete, the value of bpref deviates from this value, though it continues to rank systems in a manner similar to average precision evaluated with a complete judgment set. In this work, we propose three evaluation measures that (1) are approximations to average precision even when the relevance judgments are incomplete and (2) are more robust to incomplete or imperfect relevance judgments than bpref. The proposed estimates of average precision are simple and accurate, and we demonstrate the utility of these estimates using TREC data.