Evaluating interactive systems in TREC
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
The Turn: Integration of Information Seeking and Retrieval in Context (The Information Retrieval Series)
Semantic components enhance retrieval of domain-specific documents
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems
Information Processing and Management: an International Journal
Some(what) grand challenges for information retrieval
ACM SIGIR Forum
A comparison of query and term suggestion features for interactive searching
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Modeling Expected Utility of Multi-session Information Distillation
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Explaining User Performance in Information Retrieval: Challenges to IR Evaluation
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Methods for Evaluating Interactive Information Retrieval Systems with Users
Foundations and Trends in Information Retrieval
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Beyond DCG: user behavior as a predictor of a successful search
Proceedings of the third ACM international conference on Web search and data mining
Learning to rank relevant and novel documents through user feedback
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Evaluating multi-query sessions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Advances on the development of evaluation measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Contextual evaluation of query reformulations in a search session by user simulation
Proceedings of the 21st ACM international conference on Information and knowledge management
Toward self-correcting search engines: using underperforming queries to improve search
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Toward whole-session relevance: exploring intrinsic diversity in web search
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
IR research has a strong tradition of laboratory evaluation of systems. Such research is based on test collections, pre-defined test topics, and standard evaluation metrics. While recent research has emphasized the user viewpoint by proposing user-based metrics and non-binary relevance assessments, the methods are insufficient for truly user-based evaluation. The common assumption of a single query per topic and session poorly represents real life. On the other hand, one well-known metric for multiple queries per session, instance recall, does not capture early (within session) retrieval of (highly) relevant documents. We propose an extension to the Discounted Cumulated Gain (DCG) metric, the Session-based DCG (sDCG) metric for evaluation scenarios involving multiple query sessions, graded relevance assessments, and open-ended user effort including decisions to stop searching. The sDCG metric discounts relevant results from later queries within a session. We exemplify the sDCG metric with data from an interactive experiment, we discuss how the metric might be applied, and we present research questions for which the metric is helpful.