Evaluating multi-query sessions

Authors:
Evangelos Kanoulas;Ben Carterette;Paul D. Clough;Mark Sanderson
Affiliations:
University of Sheffield, Sheffield, United Kingdom;University of Delaware, Newark, DE, USA;University of Sheffield, Sheffield, United Kingdom;RMIT, Melbourne, Australia
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 10
Cited 13

Vox populi: the public searching of the Web

Journal of the American Society for Information Science and Technology
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Alternatives to Bpref

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Semantic components enhance retrieval of domain-specific documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Modeling Expected Utility of Multi-session Information Distillation

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Click-based evidence for decaying weight distributions in search effectiveness metrics

Information Retrieval
Discounted cumulated gain based evaluation of multiple-query IR sessions

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

System effectiveness, user models, and user utility: a conceptual framework for investigation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Model-based inference about IR systems

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Simulating simple user behavior for system effectiveness evaluation

Proceedings of the 20th ACM international conference on Information and knowledge management
Advances on the development of evaluation measures

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
The wisdom of advertisers: mining subgoals via query clustering

Proceedings of the 21st ACM international conference on Information and knowledge management
Contextual evaluation of query reformulations in a search session by user simulation

Proceedings of the 21st ACM international conference on Information and knowledge management
Models and metrics: IR evaluation as a user process

Proceedings of the Seventeenth Australasian Document Computing Symposium
Tempo of search actions to modeling successful sessions

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Summaries, ranked retrieval and sessions: a unified framework for information access evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Toward whole-session relevance: exploring intrinsic diversity in web search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
How do users respond to voice input errors?: lexical and phonetic query reformulation in voice search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Struggling or exploring?: disambiguating long search sessions

Proceedings of the 7th ACM international conference on Web search and data mining
Report on the SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation (MUBE 2013)

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

The standard system-based evaluation paradigm has focused on assessing the performance of retrieval systems in serving the best results for a single query. Real users, however, often begin an interaction with a search engine with a sufficiently under-specified query that they will need to reformulate before they find what they are looking for. In this work we consider the problem of evaluating retrieval systems over test collections of multi-query sessions. We propose two families of measures: a model-free family that makes no assumption about the user's behavior over a session, and a model-based family with a simple model of user interactions over the session. In both cases we generalize traditional evaluation metrics such as average precision to multi-query session evaluation. We demonstrate the behavior of the proposed metrics by using the new TREC 2010 Session track collection and simulations over the TREC-9 Query track collection.