A critical investigation of recall and precision as measures of retrieval system performance
ACM Transactions on Information Systems (TOIS)
Determining the effectiveness of retrieval algorithms
Information Processing and Management: an International Journal
The pragmatics of information retrieval experimentation, revisited
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Presenting results of experimental retrieval comparisons
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
The relevance of recall and precision in user evaluation
Journal of the American Society for Information Science - Special issue: relevance research
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Evaluation of evaluation in information retrieval
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance assessments and the measurement of retrieval effectiveness
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Journal of the American Society for Information Science - Special topic issue on the history of documentation and information science: part II
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Measures of relative relevance and ranked half-life: performance indicators for interactive IR
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
When information retrieval measures agree about the relative quality of document rankings
Journal of the American Society for Information Science
Information Retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
The Philosophy of Information Retrieval Evaluation
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Current Status of the Evaluation of Information Retrieval
Journal of Medical Systems
Measuring retrieval effectiveness: a new proposal and a first experimental validation
Journal of the American Society for Information Science and Technology
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Binary and graded relevance in IR evaluations: comparison of the effects on ranking of IR systems
Information Processing and Management: an International Journal
A utility theoretic approach to determining optimal wait times in distributed information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
When will information retrieval be "good enough"?
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A geometric interpretation of r-precision and its correlation with average precision
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical precision of information retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Reliable information retrieval evaluation with incomplete and biased judgements
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Strategic system comparisons via targeted relevance judgments
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Score standardization for inter-collection comparison of retrieval systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Ranking the NTCIR systems based on multigrade relevance
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Score standardization for inter-collection comparison of retrieval systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Comparing metrics across TREC and NTCIR:: the robustness to pool depth bias
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Precision-at-ten considered redundant
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Comparing metrics across TREC and NTCIR: the robustness to system bias
Proceedings of the 17th ACM conference on Information and knowledge management
Application of Information Retrieval Techniques for Source Code Authorship Attribution
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Score adjustment for correction of pooling bias
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Modeling Expected Utility of Multi-session Information Distillation
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
An Effectiveness Measure for Ambiguous and Underspecified Queries
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Building a framework for the probability ranking principle by a family of expected weighted rank
ACM Transactions on Information Systems (TOIS)
Improvements that don't add up: ad-hoc retrieval results since 1998
Proceedings of the 18th ACM conference on Information and knowledge management
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Click-based evidence for decaying weight distributions in search effectiveness metrics
Information Retrieval
Visualizing differences in web search algorithms using the expected weighted hoeffding distance
Proceedings of the 19th international conference on World wide web
A user behavior model for average precision and its generalization to graded judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Human performance and retrieval precision revisited
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
PRES: a score metric for evaluating recall-oriented information retrieval applications
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A similarity measure for indefinite rankings
ACM Transactions on Information Systems (TOIS)
Extended Boolean retrieval for systematic biomedical reviews
ACSC '10 Proceedings of the Thirty-Third Australasian Conferenc on Computer Science - Volume 102
Score aggregation techniques in retrieval experimentation
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Web search solved?: all result rankings the same?
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Expected browsing utility for web search evaluation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Visualizations for the spyglass ontology-based information analysis and retrieval system
Proceedings of the 48th Annual Southeast Regional Conference
A comparative analysis of cascade measures for novelty and diversity
Proceedings of the fourth ACM international conference on Web search and data mining
Ranking from pairs and triplets: information quality, evaluation methods and query complexity
Proceedings of the fourth ACM international conference on Web search and data mining
Optimizing two-dimensional search results presentation
Proceedings of the fourth ACM international conference on Web search and data mining
BDTEX: A GQM-based Bayesian approach for the detection of antipatterns
Journal of Systems and Software
Evaluation effort, reliability and reusability in XML retrieval
Journal of the American Society for Information Science and Technology
An analysis of NP-completeness in novelty and diversity ranking
Information Retrieval
Evaluating new search engine configurations with pre-existing judgments and clicks
Proceedings of the 20th international conference on World wide web
On the informativeness of cascade and intent-aware effectiveness measures
Proceedings of the 20th international conference on World wide web
Evaluation of information retrieval for E-discovery
Artificial Intelligence and Law
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
System effectiveness, user models, and user utility: a conceptual framework for investigation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating diversified search results using per-intent graded relevance
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating multi-query sessions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
What deliberately degrading search quality tells us about discount functions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Model-based inference about IR systems
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Rank and relevance in novelty and diversity metrics for recommender systems
Proceedings of the fifth ACM conference on Recommender systems
Discounted cumulative gain and user decision models
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
TOPSIG: topology preserving document signatures
Proceedings of the 20th ACM international conference on Information and knowledge management
Simulating simple user behavior for system effectiveness evaluation
Proceedings of the 20th ACM international conference on Information and knowledge management
Time-based calibration of effectiveness measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Evaluating aggregated search pages
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Top-k learning to rank: labeling, ranking and evaluation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A utility-theoretic ranking method for semi-automated text classification
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Advances on the development of evaluation measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Generic subset ranking using binary classifiers
Theoretical Computer Science
Modeling user variance in time-biased gain
Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Incorporating variability in user behavior into systems based evaluation
Proceedings of the 21st ACM international conference on Information and knowledge management
Contextual evaluation of query reformulations in a search session by user simulation
Proceedings of the 21st ACM international conference on Information and knowledge management
Models and metrics: IR evaluation as a user process
Proceedings of the Seventeenth Australasian Document Computing Symposium
Model Based Comparison of Discounted Cumulative Gain and Average Precision
Journal of Discrete Algorithms
Applying reinforcement learning for web pages ranking algorithms
Applied Soft Computing
Using intent information to model user behavior in diversified search
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Ranked accuracy and unstructured distributed search
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Journal of Web Engineering
How query cost affects search behavior
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A mutual information-based framework for the analysis of information retrieval systems
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A general evaluation measure for document organization tasks
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Preference based evaluation measures for novelty and diversity
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Predictive model performance: offline and online evaluations
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
On the reliability and intuitiveness of aggregated search metrics
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Users versus models: what observation tells us about effectiveness metrics
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Increasing evaluation sensitivity to diversity
Information Retrieval
The water filling model and the cube test: multi-dimensional evaluation for professional search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods
ACM Transactions on Information Systems (TOIS)
Information quality measurement of medical encoding support based on usability
Computer Methods and Programs in Biomedicine
Exploiting user disagreement for web search evaluation: an experimental approach
Proceedings of the 7th ACM international conference on Web search and data mining
Contextual and dimensional relevance judgments for reusable SERP-level evaluation
Proceedings of the 23rd international conference on World wide web
Improving ranking performance with cost-sensitive ordinal classification via regression
Information Retrieval
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
A range of methods for measuring the effectiveness of information retrieval systems has been proposed. These are typically intended to provide a quantitative single-value summary of a document ranking relative to a query. However, many of these measures have failings. For example, recall is not well founded as a measure of satisfaction, since the user of an actual system cannot judge recall. Average precision is derived from recall, and suffers from the same problem. In addition, average precision lacks key stability properties that are needed for robust experiments. In this article, we introduce a new effectiveness metric, rank-biased precision, that avoids these problems. Rank-biased pre-cision is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even when only partial relevance judgments are available.