PSkip: estimating relevance ranking quality from web search clickthrough data

Authors:
Kuansan Wang;Toby Walker;Zijian Zheng
Affiliations:
Microsoft Corporation, Redmond, WA, USA;Microsoft Corporation, Redmond, WA, USA;Microsoft Corporation, Redmond, WA, USA
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 30
Cited 26

An introduction to signal detection and estimation (2nd ed.)

An introduction to signal detection and estimation (2nd ed.)
Information filtering based on user behavior analysis and best match text retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of evaluation in information retrieval

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Measures of relative relevance and ranked half-life: performance indicators for interactive IR

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Text retrieval and filtering: analytic models of performance

Text retrieval and filtering: analytic models of performance
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Implicit interest indicators

Proceedings of the 6th international conference on Intelligent user interfaces
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A taxonomy of web search

ACM SIGIR Forum
Implicit feedback for inferring user preference: a bibliography

ACM SIGIR Forum
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating implicit measures to improve web search

ACM Transactions on Information Systems (TOIS)
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Information retrieval in context: IRiX

ACM SIGIR Forum
Learning user interaction models for predicting web search result preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search

ACM Transactions on Information Systems (TOIS)
Investigating behavioral variability in web search

Proceedings of the 16th international conference on World Wide Web
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Continuing information retrieval's tradition of experimentation

Communications of the ACM
Are people biased in their use of search engines?

Communications of the ACM - Alternate reality gaming
An experimental comparison of click position-bias models

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A user browsing model to predict search engine click data from past observations.

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Bypass rates: reducing query abandonment using negative inferences

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient multiple-click models in web search

Proceedings of the Second ACM International Conference on Web Search and Data Mining

Inferring search behaviors using partially observable Markov (POM) model

Proceedings of the third ACM international conference on Web search and data mining
Building taxonomy of web search intents for name entity queries

Proceedings of the 19th international conference on World wide web
Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data

Proceedings of the 19th international conference on World wide web
Co-optimization of multiple relevance metrics in web search

Proceedings of the 19th international conference on World wide web
Learning more powerful test statistics for click-based retrieval evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A user behavior model for average precision and its generalization to graded judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating search systems using result page context

Proceedings of the third symposium on Information interaction in context
Bayesian Browsing Model: Exact Inference of Document Relevance from Petabyte-Scale Data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Inferring search behaviors using partially observable markov model with duration (POMD)

Proceedings of the fourth ACM international conference on Web search and data mining
Addressing people's information needs directly in a web search result page

Proceedings of the 20th international conference on World wide web
Web scale NLP: a case study on url word breaking

Proceedings of the 20th international conference on World wide web
No clicks, no problem: using cursor movements to understand and improve search

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Inferring and using location metadata to personalize web search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Personalizing web search results by reading level

Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation

ACM Transactions on Information Systems (TOIS)
Characterizing web content, user interests, and search behavior by reading level and topic

Proceedings of the fifth ACM international conference on Web search and data mining
Probabilistic models for personalizing web search

Proceedings of the fifth ACM international conference on Web search and data mining
Mining for insights in the search engine query stream

Proceedings of the 21st international conference companion on World Wide Web
Social annotations: utility and prediction modeling

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On caption bias in interleaving experiments

Proceedings of the 21st ACM international conference on Information and knowledge management
Optimized interleaving for online retrieval evaluation

Proceedings of the sixth ACM international conference on Web search and data mining
Practical online retrieval evaluation

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Fighting search engine amnesia: reranking repeated results

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Evaluating and predicting user engagement change with degraded search relevance

Proceedings of the 22nd international conference on World Wide Web
Through-the-looking glass: utilizing rich post-search trail statistics for web search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we report our efforts in mining the information encoded as clickthrough data in the server logs to evaluate and monitor the relevance ranking quality of a commercial web search engine. We describe a metric called pSkip that aims to quantify the ranking quality by estimating the probability of users encountering non relevant results that cost them the efforts to read and skip. A search engine with a lower pSkip is regarded as having a better ranking quality. A key design goal of pSkip is to integrate the findings from two sets of user studies that utilize eye-tracking devices to track users' browsing patterns on the search result pages, and that use specially instrumented browsers to actively solicit users' explicit judgments on their search activities. We present the derivation of the maximum likelihood estimation of pSkip and demonstrate its efficacy in describing the user study data. The mathematical properties of pSkip are further analyzed and compared with several objective metrics as well as the cumulated gain method that uses subjective judgments. Experimental data show that pSkip can measure aspects of the search quality that these existing metrics are not designed or fail to address, such as identifying the real search intents expressed in the ambiguous queries. Although effective and superior in many ways, we also report a series of experiments that show pSkip may be influenced by system issues that are not directly related to relevance ranking, suggesting that measurements complementary to pSkip are still needed in order to form a holistic and accurate characterization of the ranking quality.