How does clickthrough data reflect retrieval quality?

Authors:
Filip Radlinski;Madhu Kurup;Thorsten Joachims
Affiliations:
Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 15
Cited 74

Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Task-Oriented Non-Interactive Evaluation Methodologyfor Information Retrieval Systems

Information Retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Implicit feedback for inferring user preference: a bibliography

ACM SIGIR Forum
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating implicit measures to improve web search

ACM Transactions on Information Systems (TOIS)
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Learning user interaction models for predicting web search result preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
User performance versus precision measures for simple search tasks

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search

ACM Transactions on Information Systems (TOIS)
Automatic search engine performance evaluation with click-through data analysis

Proceedings of the 16th international conference on World Wide Web
How well does result relevance predict session satisfaction?

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Here or there: preference judgments for relevance

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Analysis of long queries in a large scale search log

Proceedings of the 2009 workshop on Web Search Click Data
Comparative analysis of clicks and judgments for IR evaluation

Proceedings of the 2009 workshop on Web Search Click Data
Tailoring click models to user goals

Proceedings of the 2009 workshop on Web Search Click Data
Ranking interesting subgroups

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Interactively optimizing information retrieval systems as a dueling bandits problem

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Query-page intention matching using clicked titles and snippets to boost search rankings

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Modeling and predicting user behavior in sponsored search

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PSkip: estimating relevance ranking quality from web search clickthrough data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptation of offline vertical selection predictions in the presence of user feedback

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Improving web page classification by label-propagation over click graphs

Proceedings of the 18th ACM conference on Information and knowledge management
Evaluation of methods for relative comparison of retrieval systems based on clickthroughs

Proceedings of the 18th ACM conference on Information and knowledge management
Model adaptation via model interpolation and boosting for web search ranking

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Empirical exploitation of click data for task specific ranking

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Improving quality of training data for learning to rank using click-through data

Proceedings of the third ACM international conference on Web search and data mining
Inferring search behaviors using partially observable Markov (POM) model

Proceedings of the third ACM international conference on Web search and data mining
Beyond DCG: user behavior as a predictor of a successful search

Proceedings of the third ACM international conference on Web search and data mining
Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data

Proceedings of the 19th international conference on World wide web
Today's and tomorrow's retrieval practice in the audiovisual archive

Proceedings of the ACM International Conference on Image and Video Retrieval
Relevance and ranking in online dating systems

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Learning more powerful test statistics for click-based retrieval evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Do user preferences and evaluation measures line up?

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing the sensitivity of information retrieval metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing click-through data to purchase decisions for retrieval evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating search systems using result page context

Proceedings of the third symposium on Information interaction in context
Personalizing web search using long term browsing history

Proceedings of the fourth ACM international conference on Web search and data mining
Evaluating search engines by clickthrough data

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part II
Addressing people's information needs directly in a web search result page

Proceedings of the 20th international conference on World wide web
Learning to re-rank: query-dependent image re-ranking using click data

Proceedings of the 20th international conference on World wide web
An examination of two delivery modes for interactive search system experiments: remote and laboratory

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Balancing exploration and exploitation in learning to rank online

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Search snippet evaluation at yandex: lessons learned and future directions

CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
A task level metric for measuring web search satisfaction and its application on improving relevance estimation

Proceedings of the 20th ACM international conference on Information and knowledge management
A probabilistic method for inferring preferences from clicks

Proceedings of the 20th ACM international conference on Information and knowledge management
A nugget-based test collection construction paradigm

Proceedings of the 20th ACM international conference on Information and knowledge management
Recency ranking by diversification of result set

Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation

ACM Transactions on Information Systems (TOIS)
IR system evaluation using nugget-based test collections

Proceedings of the fifth ACM international conference on Web search and data mining
Interpreting user inactivity on search results

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Hierarchical composable optimization of web pages

Proceedings of the 21st international conference companion on World Wide Web
Learning from users' querying experience on intranets

Proceedings of the 21st international conference companion on World Wide Web
Enriching query flow graphs with click information

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
The K-armed dueling bandits problem

Journal of Computer and System Sciences
Analysis of query reformulations in a search engine of a local web site

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Adaptation of the concept hierarchy model with search logs for query recommendation on intranets

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A semi-supervised approach to modeling web search satisfaction

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An Online Learning Framework for Refining Recency Search Results with User Click Feedback

ACM Transactions on Information Systems (TOIS)
On caption bias in interleaving experiments

Proceedings of the 21st ACM international conference on Information and knowledge management
Incorporating variability in user behavior into systems based evaluation

Proceedings of the 21st ACM international conference on Information and knowledge management
Constructing test collections by inferring document relevance via extracted relevant information

Proceedings of the 21st ACM international conference on Information and knowledge management
Estimating interleaved comparison outcomes from historical click data

Proceedings of the 21st ACM international conference on Information and knowledge management
Enriching Documents with Examples: A Corpus Mining Approach

ACM Transactions on Information Systems (TOIS)
Differences in search engine evaluations between query owners and non-owners

Proceedings of the sixth ACM international conference on Web search and data mining
Absence time and user engagement: evaluating ranking functions

Proceedings of the sixth ACM international conference on Web search and data mining
Reusing historical interaction data for faster online learning to rank for IR

Proceedings of the sixth ACM international conference on Web search and data mining
Optimized interleaving for online retrieval evaluation

Proceedings of the sixth ACM international conference on Web search and data mining
Practical online retrieval evaluation

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
An analysis of human factors and label accuracy in crowdsourcing relevance judgments

Information Retrieval
Mining large streams of user data for personalized recommendations

ACM SIGKDD Explorations Newsletter
User model-based metrics for offline query suggestion evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Click model-based information retrieval metrics

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Fighting search engine amnesia: reranking repeated results

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Preference based evaluation measures for novelty and diversity

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Evaluating and predicting user engagement change with degraded search relevance

Proceedings of the 22nd international conference on World Wide Web
Personalization of web-search using short-term browsing context

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Personalized models of search satisfaction

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Using historical click data to increase interleaving sensitivity

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Evaluating aggregated search using interleaving

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Lerot: an online learning to rank framework

Proceedings of the 2013 workshop on Living labs for information retrieval evaluation
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

ACM Transactions on Information Systems (TOIS)
Modeling dwell time to predict click-level satisfaction

Proceedings of the 7th ACM international conference on Web search and data mining
Relative confidence sampling for efficient on-line ranker evaluation

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster, cheaper, and more user centered. However, the relationship between observable user behavior and retrieval quality is not yet fully understood. We present a sequence of studies investigating this relationship for an operational search engine on the arXiv.org e-print archive. We find that none of the eight absolute usage metrics we explore (e.g., number of clicks, frequency of query reformulations, abandonment) reliably reflect retrieval quality for the sample sizes we consider. However, we find that paired experiment designs adapted from sensory analysis produce accurate and reliable statements about the relative quality of two retrieval functions. In particular, we investigate two paired comparison tests that analyze clickthrough data from an interleaved presentation of ranking pairs, and we find that both give accurate and consistent results. We conclude that both paired comparison tests give substantially more accurate and sensitive evaluation results than absolute usage metrics in our domain.