Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
A comparison of statistical significance tests for information retrieval evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Novelty and diversity in information retrieval evaluation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A new interpretation of average precision
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Click-based evidence for decaying weight distributions in search effectiveness metrics
Information Retrieval
Extending average precision to graded relevance judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Expected browsing utility for web search evaluation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Modern Applied Statistics with S
Modern Applied Statistics with S
System effectiveness, user models, and user utility: a conceptual framework for investigation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating multi-query sessions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Multiple testing in statistical analysis of systems-based information retrieval experiments
ACM Transactions on Information Systems (TOIS)
On smoothing average precision
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
Researchers and developers of IR systems generally want to make inferences about the effectiveness of their systems over a population of user needs, topics, or queries. The most common framework for this is statistical hypothesis testing, which involves computing the probability of measuring the observed effectiveness of two systems over a sample of topics under a null hypothesis that the difference in effectiveness is unremarkable. It is not commonly known that these tests involve models of effectiveness. In this work we first explicitly describe the modeling assumptions of the t-test, then develop a Bayesian modeling approach that makes modeling assumptions explicit and easy to change for specific challenges in IR evaluation.