Simulating simple user behavior for system effectiveness evaluation

Authors:
Ben Carterette;Evangelos Kanoulas;Emine Yilmaz
Affiliations:
University of Delaware, Newark, DE, USA;University of Sheffield, Sheffield, United Kingdom;Microsoft Research, Cambridge, United Kingdom
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 19
Cited 15

Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
User performance versus precision measures for simple search tasks

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Extending the Linear Model with R (Texts in Statistical Science)

Extending the Linear Model with R (Texts in Statistical Science)
Learn from web search logs to organize search results

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A user browsing model to predict search engine click data from past observations.

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Modeling Expected Utility of Multi-session Information Distillation

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Empirical justification of the gain and discount function for nDCG

Proceedings of the 18th ACM conference on Information and knowledge management
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Click-based evidence for decaying weight distributions in search effectiveness metrics

Information Retrieval
Extending average precision to graded relevance judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Expected browsing utility for web search evaluation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
System effectiveness, user models, and user utility: a conceptual framework for investigation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating multi-query sessions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Time-based calibration of effectiveness measures

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On per-topic variance in IR evaluation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling user variance in time-biased gain

Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Incorporating variability in user behavior into systems based evaluation

Proceedings of the 21st ACM international conference on Information and knowledge management
Evaluating reward and risk for vertical selection

Proceedings of the 21st ACM international conference on Information and knowledge management
Models and metrics: IR evaluation as a user process

Proceedings of the Seventeenth Australasian Document Computing Symposium
Model Based Comparison of Discounted Cumulative Gain and Average Precision

Journal of Discrete Algorithms
Summaries, ranked retrieval and sessions: a unified framework for information access evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Utilizing query change for session search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A general evaluation measure for document organization tasks

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Evaluating and predicting user engagement change with degraded search relevance

Proceedings of the 22nd international conference on World Wide Web
Modeling behavioral factors ininteractive information retrieval

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Users versus models: what observation tells us about effectiveness metrics

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Report on the SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation (MUBE 2013)

ACM SIGIR Forum

Quantified Score

Hi-index	0.01

Visualization

Abstract

Information retrieval effectiveness evaluation typically takes one of two forms: batch experiments based on static test collections, or lab studies measuring actual users interacting with a system. Test collection experiments are sometimes viewed as introducing too many simplifying assumptions to accurately predict the usefulness of a system to its users. As a result, there is great interest in creating test collections and measures that better model user behavior. One line of research involves developing measures that include a parameterized user model; choosing a parameter value simulates a particular type of user. We propose that these measures offer an opportunity to more accurately simulate the variance due to user behavior, and thus to analyze system effectiveness to a simulated user population. We introduce a Bayesian procedure for producing sampling distributions from click data, and show how to use statistical tools to quantify the effects of variance due to parameter selection.