Rank-biased precision for measurement of retrieval effectiveness

Authors:
Alistair Moffat;Justin Zobel
Affiliations:
The University of Melbourne, Victoria, Australia;RMIT University and NICTA Victoria Research Laboratory, Victoria, Australia
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2008

Citing 33
Cited 75

A critical investigation of recall and precision as measures of retrieval system performance

ACM Transactions on Information Systems (TOIS)
Determining the effectiveness of retrieval algorithms

Information Processing and Management: an International Journal
The pragmatics of information retrieval experimentation, revisited

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Presenting results of experimental retrieval comparisons

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
The relevance of recall and precision in user evaluation

Journal of the American Society for Information Science - Special issue: relevance research
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Evaluation of evaluation in information retrieval

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance assessments and the measurement of retrieval effectiveness

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Relevance: the whole history

Journal of the American Society for Information Science - Special topic issue on the history of documentation and information science: part II
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Measures of relative relevance and ranked half-life: performance indicators for interactive IR

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
When information retrieval measures agree about the relative quality of document rankings

Journal of the American Society for Information Science
Information Retrieval

Information Retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Current Status of the Evaluation of Information Retrieval

Journal of Medical Systems
Measuring retrieval effectiveness: a new proposal and a first experimental validation

Journal of the American Society for Information Science and Technology
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Binary and graded relevance in IR evaluations: comparison of the effects on ranking of IR systems

Information Processing and Management: an International Journal
A utility theoretic approach to determining optimal wait times in distributed information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Accurately interpreting clickthrough data as implicit feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
When will information retrieval be "good enough"?

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A geometric interpretation of r-precision and its correlation with average precision

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical precision of information retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Reliable information retrieval evaluation with incomplete and biased judgements

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Alternatives to Bpref

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Strategic system comparisons via targeted relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Score standardization for inter-collection comparison of retrieval systems

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Ranking the NTCIR systems based on multigrade relevance

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology

Score standardization for inter-collection comparison of retrieval systems

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Comparing metrics across TREC and NTCIR:: the robustness to pool depth bias

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Precision-at-ten considered redundant

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Comparing metrics across TREC and NTCIR: the robustness to system bias

Proceedings of the 17th ACM conference on Information and knowledge management
Application of Information Retrieval Techniques for Source Code Authorship Attribution

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Score adjustment for correction of pooling bias

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Modeling Expected Utility of Multi-session Information Distillation

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
An Effectiveness Measure for Ambiguous and Underspecified Queries

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Building a framework for the probability ranking principle by a family of expected weighted rank

ACM Transactions on Information Systems (TOIS)
Improvements that don't add up: ad-hoc retrieval results since 1998

Proceedings of the 18th ACM conference on Information and knowledge management
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Against recall: is it persistence, cardinality, density, coverage, or totality?

ACM SIGIR Forum
Click-based evidence for decaying weight distributions in search effectiveness metrics

Information Retrieval
Visualizing differences in web search algorithms using the expected weighted hoeffding distance

Proceedings of the 19th international conference on World wide web
A user behavior model for average precision and its generalization to graded judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Human performance and retrieval precision revisited

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
PRES: a score metric for evaluating recall-oriented information retrieval applications

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A general approximation framework for direct optimization of information retrieval measures

Information Retrieval
A similarity measure for indefinite rankings

ACM Transactions on Information Systems (TOIS)
Extended Boolean retrieval for systematic biomedical reviews

ACSC '10 Proceedings of the Thirty-Third Australasian Conferenc on Computer Science - Volume 102
Score aggregation techniques in retrieval experimentation

ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Web search solved?: all result rankings the same?

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Expected browsing utility for web search evaluation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Visualizations for the spyglass ontology-based information analysis and retrieval system

Proceedings of the 48th Annual Southeast Regional Conference
A comparative analysis of cascade measures for novelty and diversity

Proceedings of the fourth ACM international conference on Web search and data mining
Ranking from pairs and triplets: information quality, evaluation methods and query complexity

Proceedings of the fourth ACM international conference on Web search and data mining
Optimizing two-dimensional search results presentation

Proceedings of the fourth ACM international conference on Web search and data mining
BDTEX: A GQM-based Bayesian approach for the detection of antipatterns

Journal of Systems and Software
Evaluation effort, reliability and reusability in XML retrieval

Journal of the American Society for Information Science and Technology
An analysis of NP-completeness in novelty and diversity ranking

Information Retrieval
Evaluating new search engine configurations with pre-existing judgments and clicks

Proceedings of the 20th international conference on World wide web
On the informativeness of cascade and intent-aware effectiveness measures

Proceedings of the 20th international conference on World wide web
Evaluation of information retrieval for E-discovery

Artificial Intelligence and Law
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
System effectiveness, user models, and user utility: a conceptual framework for investigation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating diversified search results using per-intent graded relevance

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating multi-query sessions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
What deliberately degrading search quality tells us about discount functions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Model-based inference about IR systems

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Rank and relevance in novelty and diversity metrics for recommender systems

Proceedings of the fifth ACM conference on Recommender systems
Discounted cumulative gain and user decision models

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
TOPSIG: topology preserving document signatures

Proceedings of the 20th ACM international conference on Information and knowledge management
Simulating simple user behavior for system effectiveness evaluation

Proceedings of the 20th ACM international conference on Information and knowledge management
Time-based calibration of effectiveness measures

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Evaluating aggregated search pages

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Top-k learning to rank: labeling, ranking and evaluation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A utility-theoretic ranking method for semi-automated text classification

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Advances on the development of evaluation measures

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Generic subset ranking using binary classifiers

Theoretical Computer Science
Modeling user variance in time-biased gain

Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Incorporating variability in user behavior into systems based evaluation

Proceedings of the 21st ACM international conference on Information and knowledge management
Contextual evaluation of query reformulations in a search session by user simulation

Proceedings of the 21st ACM international conference on Information and knowledge management
Models and metrics: IR evaluation as a user process

Proceedings of the Seventeenth Australasian Document Computing Symposium
Model Based Comparison of Discounted Cumulative Gain and Average Precision

Journal of Discrete Algorithms
Applying reinforcement learning for web pages ranking algorithms

Applied Soft Computing
Using intent information to model user behavior in diversified search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Ranked accuracy and unstructured distributed search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
A survey of faceted search

Journal of Web Engineering
How query cost affects search behavior

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A mutual information-based framework for the analysis of information retrieval systems

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A general evaluation measure for document organization tasks

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Preference based evaluation measures for novelty and diversity

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Predictive model performance: offline and online evaluations

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A learning approach to optimizing exploration---exploitation tradeoff in relevance feedback

Information Retrieval
On the reliability and intuitiveness of aggregated search metrics

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Users versus models: what observation tells us about effectiveness metrics

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Increasing evaluation sensitivity to diversity

Information Retrieval
The water filling model and the cube test: multi-dimensional evaluation for professional search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

ACM Transactions on Information Systems (TOIS)
Information quality measurement of medical encoding support based on usability

Computer Methods and Programs in Biomedicine
Exploiting user disagreement for web search evaluation: an experimental approach

Proceedings of the 7th ACM international conference on Web search and data mining
Contextual and dimensional relevance judgments for reusable SERP-level evaluation

Proceedings of the 23rd international conference on World wide web
Report on the SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation (MUBE 2013)

ACM SIGIR Forum
Improving ranking performance with cost-sensitive ordinal classification via regression

Information Retrieval
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A range of methods for measuring the effectiveness of information retrieval systems has been proposed. These are typically intended to provide a quantitative single-value summary of a document ranking relative to a query. However, many of these measures have failings. For example, recall is not well founded as a measure of satisfaction, since the user of an actual system cannot judge recall. Average precision is derived from recall, and suffers from the same problem. In addition, average precision lacks key stability properties that are needed for robust experiments. In this article, we introduce a new effectiveness metric, rank-biased precision, that avoids these problems. Rank-biased pre-cision is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even when only partial relevance judgments are available.