Do batch and user evaluations give the same results?
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Why batch and user evaluations do not give the same results
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
User interface effects in past batch versus user experiments
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Current Status of the Evaluation of Information Retrieval
Journal of Medical Systems
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
When will information retrieval be "good enough"?
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The Turn: Integration of Information Seeking and Retrieval in Context (The Information Retrieval Series)
User performance versus precision measures for simple search tasks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
How well does result relevance predict session satisfaction?
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Including summaries in system evaluation
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A review of factors influencing user satisfaction in information retrieval
Journal of the American Society for Information Science and Technology
Do user preferences and evaluation measures line up?
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Human performance and retrieval precision revisited
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing the sensitivity of information retrieval metrics
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating search systems using result page context
Proceedings of the third symposium on Information interaction in context
Score aggregation techniques in retrieval experimentation
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
The effect of user characteristics on search effectiveness in information retrieval
Information Processing and Management: an International Journal
IR research: systems, interaction, evaluation and theories
ACM SIGIR Forum
Contextual and dimensional relevance judgments for reusable SERP-level evaluation
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Test collections are extensively used in the evaluation of information retrieval systems. Crucial to their use is the degree to which results from them predict user effectiveness. At first, past studies did not substantiate a relationship between system and user effectiveness; more recently, however, correlations have begun to emerge. The results of this paper strengthen and extend those findings. We introduce a novel methodology for investigating the relationship, which shows great success in establishing a significant correlation between system and user effectiveness. It is shown that users behave differently and discern differences between pairs of systems that have a very small absolute difference in test collection effectiveness. Our results strengthen the use of test collections in IR evaluation, confirming that users' effectiveness can be predicted successfully.