Test Collection-Based IR Evaluation Needs Extension toward Sessions --- A Case of Extremely Short Queries

Authors:
Heikki Keskustalo;Kalervo Järvelin;Ari Pirkola;Tarun Sharma;Marianne Lykke
Affiliations:
University of Tampere, Finland;University of Tampere, Finland;University of Tampere, Finland;University of Tampere, Finland;Royal School of Library and Information Science, Denmark
Venue:
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Year:
2009

Citing 14
Cited 8

Evaluation measures for interactive information retrieval

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Relevance and retrieval evaluation: perspectives from medicine

Journal of the American Society for Information Science - Special issue: relevance research
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Why batch and user evaluations do not give the same results

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Liberal relevance criteria of TREC -: counting on negligible documents?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
The influence of relevance levels on the effectiveness of interactive information retrieval

Journal of the American Society for Information Science and Technology
Accurately interpreting clickthrough data as implicit feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Semantic components enhance retrieval of domain-specific documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
User adaptation: good results from poor systems

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Ambiguous queries: test collections need more sense

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Identifying clusters of user behavior in intranet search engine log files

Journal of the American Society for Information Science and Technology
Discounted cumulated gain based evaluation of multiple-query IR sessions

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Validating query simulators: an experiment using commercial searches and purchases

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
The economics in interactive information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Quantifying the impact of concept recognition on biomedical information retrieval

Information Processing and Management: an International Journal
Time-based calibration of effectiveness measures

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Time drives interaction: simulating sessions in diverse searching environments

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling user variance in time-biased gain

Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Modeling behavioral factors ininteractive information retrieval

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Report on the SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation (MUBE 2013)

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is overwhelming evidence suggesting that the real users of IR systems often prefer using extremely short queries (one or two individual words) but they try out several queries if needed. Such behavior is fundamentally different from the process modeled in the traditional test collection-based IR evaluation based on using more verbose queries and only one query per topic. In the present paper, we propose an extension to the test collection-based evaluation. We will utilize sequences of short queries based on empirically grounded but idealized session strategies. We employ TREC data and have test persons to suggest search words, while simulating sessions based on the idealized strategies for repeatability and control. The experimental results show that, surprisingly, web-like very short queries (including one-word query sequences) typically lead to good enough results even in a TREC type test collection. This finding motivates the observed real user behavior: as few very simple attempts normally lead to good enough results, there is no need to pay more effort. We conclude by discussing the consequences of our finding for IR evaluation.