Do batch and user evaluations give the same results?

  • Authors:
  • William Hersh;Andrew Turpin;Susan Price;Benjamin Chan;Dale Kramer;Lynetta Sacherek;Daniel Olson

  • Affiliations:
  • Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR

  • Venue:
  • SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Do improvements in system performance demonstrated by batch evaluations confer the same benefit for real users? We carried out experiments designed to investigate this question. After identifying a weighting scheme that gave maximum improvement over the baseline in a non-interactive evaluation, we used it with real users searching on an instance recall task. Our results showed the weighting scheme giving beneficial results in batch studies did not do so with real users. Further analysis did identify other factors predictive of instance recall, including number of documents saved by the user, document recall, and number of documents seen by the user.