Do batch and user evaluations give the same results?

Authors:
William Hersh;Andrew Turpin;Susan Price;Benjamin Chan;Dale Kramer;Lynetta Sacherek;Daniel Olson
Affiliations:
Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR;Division of Medical Informatics & Outcomes Research, Oregon Health Sciences University, Portland, OR
Venue:
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2000

Citing 9
Cited 43

Development of an instrument measuring user satisfaction of the human-computer interface

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Overview of the first TREC conference

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance and retrieval evaluation: perspectives from medicine

Journal of the American Society for Information Science - Special issue: relevance research
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring the similarity space

ACM SIGIR Forum
Comparing interactive information retrieval systems across sites: the TREC-6 interactive track matrix experiment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Information Retrieval Experiment

Information Retrieval Experiment

Why batch and user evaluations do not give the same results

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Searcher performance in question answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
User interface effects in past batch versus user experiments

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A user-centered approach to evaluating human interaction with web search engines: an exploratory study

Information Processing and Management: an International Journal
Evaluating Interactive Cross-Language Information Retrieval: Document Selection

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
The use of dynamic contexts to improve casual internet searching

ACM Transactions on Information Systems (TOIS)
Interactive Cross-Language Document Selection

Information Retrieval
Human versus machine in the topic distillation task

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Do clarity scores for queries correlate with user performance?

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Recommended reading for IR research students

ACM SIGIR Forum
User performance versus precision measures for simple search tasks

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring the limits of single-iteration clarification dialogs

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
User simulations for evaluating answers to question series

Information Processing and Management: an International Journal
A model for quantitative evaluation of an end-to-end question-answering system

Journal of the American Society for Information Science and Technology
How do users find things with PubMed?: towards automatic utility evaluation with user simulations

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
The good and the bad system: does the test collection predict users' effectiveness?

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance thresholds in system evaluations

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Toward automatic facet analysis and need negotiation: Lessons from mediated search

ACM Transactions on Information Systems (TOIS)
The Simplest XML Retrieval Baseline That Could Possibly Work

Focused Access to XML Documents
On test collections for adaptive information retrieval

Information Processing and Management: an International Journal
Including summaries in system evaluation

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Implementing and evaluating phrasal query suggestions for proximity search

Information Systems
Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspective

Information Systems
Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspective

Information Systems
Implementing and evaluating phrasal query suggestions for proximity search

Information Systems
Methods for Evaluating Interactive Information Retrieval Systems with Users

Foundations and Trends in Information Retrieval
Metric and Relevance Mismatch in Retrieval Evaluation

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Effects of position and number of relevant documents retrieved on users' evaluations of system performance

ACM Transactions on Information Systems (TOIS)
A review of factors influencing user satisfaction in information retrieval

Journal of the American Society for Information Science and Technology
Do user preferences and evaluation measures line up?

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Human performance and retrieval precision revisited

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing the sensitivity of information retrieval metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On the potential search effectiveness of MeSH (medical subject headings) terms

Proceedings of the third symposium on Information interaction in context
A comparison of user and system query performance predictions

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Toward a semantic granularity model for domain-specific information retrieval

ACM Transactions on Information Systems (TOIS)
A study of the integration of passage-, document-, and cluster-based information for re-ranking search results

Information Retrieval
“THAT’s what i was looking for”: comparing user-rated relevance with search engine rankings

INTERACT'05 Proceedings of the 2005 IFIP TC13 international conference on Human-Computer Interaction
Time drives interaction: simulating sessions in diverse searching environments

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling user variance in time-biased gain

Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation
Research paper recommender system evaluation: a quantitative literature survey

Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation
Choices in batch information retrieval evaluation

Proceedings of the 18th Australasian Document Computing Symposium
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Do improvements in system performance demonstrated by batch evaluations confer the same benefit for real users? We carried out experiments designed to investigate this question. After identifying a weighting scheme that gave maximum improvement over the baseline in a non-interactive evaluation, we used it with real users searching on an instance recall task. Our results showed the weighting scheme giving beneficial results in batch studies did not do so with real users. Further analysis did identify other factors predictive of instance recall, including number of documents saved by the user, document recall, and number of documents seen by the user.