Do batch and user evaluations give the same results?
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Why batch and user evaluations do not give the same results
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
When will information retrieval be "good enough"?
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
User performance versus precision measures for simple search tasks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The good and the bad system: does the test collection predict users' effectiveness?
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
User adaptation: good results from poor systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance thresholds in system evaluations
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
Reciprocal rank fusion outperforms condorcet and individual rank learning methods
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Undergraduates' evaluations of assigned search topics
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Measuring assessor accuracy: a comparison of nist assessors and user study participants
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Time-based calibration of effectiveness measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling user variance in time-biased gain
Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Stochastic simulation of time-biased gain
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Several studies have found that the Cranfield approach to evaluation can report significant performance differences between retrieval systems for which little to no performance difference is found for humans completing tasks with these systems. We revisit the relationship between precision and performance by measuring human performance on tightly controlled search tasks and with user interfaces offering limited interaction. We find that human performance and retrieval precision are strongly related. We also find that users change their relevance judging behavior based on the precision of the results. This change in behavior coupled with the well-known lack of perfect inter-assessor agreement can reduce the measured performance gains predicted by increased precision.