Modeling user variance in time-biased gain

Authors:
Mark D. Smucker;Charles L. A. Clarke
Affiliations:
University of Waterloo, Canada;University of Waterloo, Canada
Venue:
Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Year:
2012

Citing 25
Cited 6

Time, relevance and interaction modelling for information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Do batch and user evaluations give the same results?

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Using information scent to model user information needs and actions and the Web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Evaluating implicit feedback models using searcher simulations

ACM Transactions on Information Systems (TOIS)
Modeling User Search Behavior

LA-WEB '05 Proceedings of the Third Latin American Web Congress
Predictive modeling of first-click behavior in web-search

Proceedings of the 15th international conference on World Wide Web
Statistical precision of information retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
How do users find things with PubMed?: towards automatic utility evaluation with user simulations

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
User adaptation: good results from poor systems

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A new interpretation of average precision

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Test Collection-Based IR Evaluation Needs Extension toward Sessions --- A Case of Extremely Short Queries

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Click-based evidence for decaying weight distributions in search effectiveness metrics

Information Retrieval
Human performance and retrieval precision revisited

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Individual differences in gaze patterns for web search

Proceedings of the third symposium on Information interaction in context
Expected browsing utility for web search evaluation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Report on the SIGIR 2010 workshop on the simulation of interaction

ACM SIGIR Forum
A comparative analysis of cascade measures for novelty and diversity

Proceedings of the fourth ACM international conference on Web search and data mining
The economics in interactive information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Simulating simple user behavior for system effectiveness evaluation

Proceedings of the 20th ACM international conference on Information and knowledge management
Eye-tracking reveals the personal styles for search result evaluation

INTERACT'05 Proceedings of the 2005 IFIP TC13 international conference on Human-Computer Interaction
IR system evaluation using nugget-based test collections

Proceedings of the fifth ACM international conference on Web search and data mining
Time-based calibration of effectiveness measures

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Stochastic simulation of time-biased gain

Proceedings of the 21st ACM international conference on Information and knowledge management

Summaries, ranked retrieval and sessions: a unified framework for information access evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Is it time for a career switch?

Proceedings of the 22nd international conference on World Wide Web
The seventeenth australasian document computing symposium

ACM SIGIR Forum
Slow Search: Information Retrieval without Time Constraints

Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Exploiting user disagreement for web search evaluation: an experimental approach

Proceedings of the 7th ACM international conference on Web search and data mining
Report on the SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation (MUBE 2013)

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cranfield-style information retrieval evaluation considers variance in user information needs by evaluating retrieval systems over a set of search topics. For each search topic, traditional metrics model all users searching ranked lists in exactly the same manner and thus have zero variance in their per-topic estimate of effectiveness. Metrics that fail to model user variance overestimate the effect size of differences between retrieval systems. The modeling of user variance is critical to understanding the impact of effectiveness differences on the actual user experience. If the variance of a difference is high, the effect on user experience will be low. Time-biased gain is an evaluation metric that models user interaction with ranked lists that are displayed using document surrogates. In this paper, we extend the stochastic simulation of time-biased gain to model the variation between users. We validate this new version of time-biased gain by showing that it produces distributions of gain that agree well with actual distributions produced by real users. With a per-topic variance in its effectiveness measure, time-biased gain allows for the measurement of the effect size of differences, which allows researchers to understand the extent to which predicted performance improvements matter to real users.