Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Task-Oriented Non-Interactive Evaluation Methodologyfor Information Retrieval Systems
Information Retrieval
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Implicit feedback for inferring user preference: a bibliography
ACM SIGIR Forum
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating implicit measures to improve web search
ACM Transactions on Information Systems (TOIS)
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Learning user interaction models for predicting web search result preferences
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
User performance versus precision measures for simple search tasks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
ACM Transactions on Information Systems (TOIS)
Automatic search engine performance evaluation with click-through data analysis
Proceedings of the 16th international conference on World Wide Web
How well does result relevance predict session satisfaction?
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Here or there: preference judgments for relevance
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Analysis of long queries in a large scale search log
Proceedings of the 2009 workshop on Web Search Click Data
Comparative analysis of clicks and judgments for IR evaluation
Proceedings of the 2009 workshop on Web Search Click Data
Tailoring click models to user goals
Proceedings of the 2009 workshop on Web Search Click Data
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Interactively optimizing information retrieval systems as a dueling bandits problem
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Query-page intention matching using clicked titles and snippets to boost search rankings
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Modeling and predicting user behavior in sponsored search
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PSkip: estimating relevance ranking quality from web search clickthrough data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptation of offline vertical selection predictions in the presence of user feedback
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Improving web page classification by label-propagation over click graphs
Proceedings of the 18th ACM conference on Information and knowledge management
Evaluation of methods for relative comparison of retrieval systems based on clickthroughs
Proceedings of the 18th ACM conference on Information and knowledge management
Model adaptation via model interpolation and boosting for web search ranking
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Empirical exploitation of click data for task specific ranking
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Improving quality of training data for learning to rank using click-through data
Proceedings of the third ACM international conference on Web search and data mining
Inferring search behaviors using partially observable Markov (POM) model
Proceedings of the third ACM international conference on Web search and data mining
Beyond DCG: user behavior as a predictor of a successful search
Proceedings of the third ACM international conference on Web search and data mining
Proceedings of the 19th international conference on World wide web
Today's and tomorrow's retrieval practice in the audiovisual archive
Proceedings of the ACM International Conference on Image and Video Retrieval
Relevance and ranking in online dating systems
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Learning more powerful test statistics for click-based retrieval evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Do user preferences and evaluation measures line up?
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing the sensitivity of information retrieval metrics
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing click-through data to purchase decisions for retrieval evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating search systems using result page context
Proceedings of the third symposium on Information interaction in context
Personalizing web search using long term browsing history
Proceedings of the fourth ACM international conference on Web search and data mining
Evaluating search engines by clickthrough data
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part II
Addressing people's information needs directly in a web search result page
Proceedings of the 20th international conference on World wide web
Learning to re-rank: query-dependent image re-ranking using click data
Proceedings of the 20th international conference on World wide web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Balancing exploration and exploitation in learning to rank online
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Search snippet evaluation at yandex: lessons learned and future directions
CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Proceedings of the 20th ACM international conference on Information and knowledge management
A probabilistic method for inferring preferences from clicks
Proceedings of the 20th ACM international conference on Information and knowledge management
A nugget-based test collection construction paradigm
Proceedings of the 20th ACM international conference on Information and knowledge management
Recency ranking by diversification of result set
Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation
ACM Transactions on Information Systems (TOIS)
IR system evaluation using nugget-based test collections
Proceedings of the fifth ACM international conference on Web search and data mining
Interpreting user inactivity on search results
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Hierarchical composable optimization of web pages
Proceedings of the 21st international conference companion on World Wide Web
Learning from users' querying experience on intranets
Proceedings of the 21st international conference companion on World Wide Web
Enriching query flow graphs with click information
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
The K-armed dueling bandits problem
Journal of Computer and System Sciences
Analysis of query reformulations in a search engine of a local web site
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Adaptation of the concept hierarchy model with search logs for query recommendation on intranets
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A semi-supervised approach to modeling web search satisfaction
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An Online Learning Framework for Refining Recency Search Results with User Click Feedback
ACM Transactions on Information Systems (TOIS)
On caption bias in interleaving experiments
Proceedings of the 21st ACM international conference on Information and knowledge management
Incorporating variability in user behavior into systems based evaluation
Proceedings of the 21st ACM international conference on Information and knowledge management
Constructing test collections by inferring document relevance via extracted relevant information
Proceedings of the 21st ACM international conference on Information and knowledge management
Estimating interleaved comparison outcomes from historical click data
Proceedings of the 21st ACM international conference on Information and knowledge management
Enriching Documents with Examples: A Corpus Mining Approach
ACM Transactions on Information Systems (TOIS)
Differences in search engine evaluations between query owners and non-owners
Proceedings of the sixth ACM international conference on Web search and data mining
Absence time and user engagement: evaluating ranking functions
Proceedings of the sixth ACM international conference on Web search and data mining
Reusing historical interaction data for faster online learning to rank for IR
Proceedings of the sixth ACM international conference on Web search and data mining
Optimized interleaving for online retrieval evaluation
Proceedings of the sixth ACM international conference on Web search and data mining
Practical online retrieval evaluation
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
An analysis of human factors and label accuracy in crowdsourcing relevance judgments
Information Retrieval
Mining large streams of user data for personalized recommendations
ACM SIGKDD Explorations Newsletter
User model-based metrics for offline query suggestion evaluation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Click model-based information retrieval metrics
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Fighting search engine amnesia: reranking repeated results
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Preference based evaluation measures for novelty and diversity
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Evaluating and predicting user engagement change with degraded search relevance
Proceedings of the 22nd international conference on World Wide Web
Personalization of web-search using short-term browsing context
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Personalized models of search satisfaction
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Using historical click data to increase interleaving sensitivity
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Evaluating aggregated search using interleaving
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Lerot: an online learning to rank framework
Proceedings of the 2013 workshop on Living labs for information retrieval evaluation
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods
ACM Transactions on Information Systems (TOIS)
Modeling dwell time to predict click-level satisfaction
Proceedings of the 7th ACM international conference on Web search and data mining
Relative confidence sampling for efficient on-line ranker evaluation
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster, cheaper, and more user centered. However, the relationship between observable user behavior and retrieval quality is not yet fully understood. We present a sequence of studies investigating this relationship for an operational search engine on the arXiv.org e-print archive. We find that none of the eight absolute usage metrics we explore (e.g., number of clicks, frequency of query reformulations, abandonment) reliably reflect retrieval quality for the sample sizes we consider. However, we find that paired experiment designs adapted from sensory analysis produce accurate and reliable statements about the relative quality of two retrieval functions. In particular, we investigate two paired comparison tests that analyze clickthrough data from an interleaved presentation of ranking pairs, and we find that both give accurate and consistent results. We conclude that both paired comparison tests give substantially more accurate and sensitive evaluation results than absolute usage metrics in our domain.