Evaluation measures for interactive information retrieval
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
User-defined relevance criteria: an exploratory study
Journal of the American Society for Information Science - Special issue: relevance research
Do batch and user evaluations give the same results?
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Why batch and user evaluations do not give the same results
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
ACM SIGIR Forum
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
When will information retrieval be "good enough"?
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
User performance versus precision measures for simple search tasks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On GMAP: and other transformations
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Evaluation by comparing result sets in context
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
The relationship between IR effectiveness measures and user satisfaction
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The good and the bad system: does the test collection predict users' effectiveness?
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
User adaptation: good results from poor systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A new rank correlation coefficient for information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
How does clickthrough data reflect retrieval quality?
Proceedings of the 17th ACM conference on Information and knowledge management
Crowdsourcing for relevance evaluation
ACM SIGIR Forum
Proceedings of the Second ACM International Conference on Web Search and Data Mining
A dynamic bayesian network click model for web search ranking
Proceedings of the 18th international conference on World wide web
A methodology for evaluating aggregated search results
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Find it if you can: a game for modeling different types of web search success using interaction data
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating diversified search results using per-intent graded relevance
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
User perspectives on query difficulty
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
The status of retrieval evaluation in the patent domain
Proceedings of the 4th workshop on Patent information retrieval
Evaluation with informational and navigational intents
Proceedings of the 21st international conference on World Wide Web
Evaluating aggregated search pages
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using preference judgments for novel document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
User model-based metrics for offline query suggestion evaluation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Preference based evaluation measures for novelty and diversity
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Relevance dimensions in preference-based IR evaluation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Which vertical search engines are relevant?
Proceedings of the 22nd international conference on World Wide Web
SRbench--a benchmark for soundtrack recommendation systems
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On the reliability and intuitiveness of aggregated search metrics
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Diversified search evaluation: lessons from the NTCIR-9 INTENT task
Information Retrieval
Composite retrieval of heterogeneous web search
Proceedings of the 23rd international conference on World wide web
Contextual and dimensional relevance judgments for reusable SERP-level evaluation
Proceedings of the 23rd international conference on World wide web
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
This paper presents results comparing user preference for search engine rankings with measures of effectiveness computed from a test collection. It establishes that preferences and evaluation measures correlate: systems measured as better on a test collection are preferred by users. This correlation is established for both "conventional web retrieval" and for retrieval that emphasizes diverse results. The nDCG measure is found to correlate best with user preferences compared to a selection of other well known measures. Unlike previous studies in this area, this examination involved a large population of users, gathered through crowd sourcing, exposed to a wide range of retrieval systems, test collections and search tasks. Reasons for user preferences were also gathered and analyzed. The work revealed a number of new results, but also showed that there is much scope for future work refining effectiveness measures to better capture user preferences.