Evaluation measures for interactive information retrieval
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Relevance and retrieval evaluation: perspectives from medicine
Journal of the American Society for Information Science - Special issue: relevance research
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Why batch and user evaluations do not give the same results
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Liberal relevance criteria of TREC -: counting on negligible documents?
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
The influence of relevance levels on the effectiveness of interactive information retrieval
Journal of the American Society for Information Science and Technology
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Semantic components enhance retrieval of domain-specific documents
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
User adaptation: good results from poor systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Ambiguous queries: test collections need more sense
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Identifying clusters of user behavior in intranet search engine log files
Journal of the American Society for Information Science and Technology
Discounted cumulated gain based evaluation of multiple-query IR sessions
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Validating query simulators: an experiment using commercial searches and purchases
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
The economics in interactive information retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Quantifying the impact of concept recognition on biomedical information retrieval
Information Processing and Management: an International Journal
Time-based calibration of effectiveness measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Time drives interaction: simulating sessions in diverse searching environments
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling user variance in time-biased gain
Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Modeling behavioral factors ininteractive information retrieval
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
There is overwhelming evidence suggesting that the real users of IR systems often prefer using extremely short queries (one or two individual words) but they try out several queries if needed. Such behavior is fundamentally different from the process modeled in the traditional test collection-based IR evaluation based on using more verbose queries and only one query per topic. In the present paper, we propose an extension to the test collection-based evaluation. We will utilize sequences of short queries based on empirically grounded but idealized session strategies. We employ TREC data and have test persons to suggest search words, while simulating sessions based on the idealized strategies for repeatability and control. The experimental results show that, surprisingly, web-like very short queries (including one-word query sequences) typically lead to good enough results even in a TREC type test collection. This finding motivates the observed real user behavior: as few very simple attempts normally lead to good enough results, there is no need to pay more effort. We conclude by discussing the consequences of our finding for IR evaluation.