Model characterization curves for federated search using click-logs: predicting user engagement metrics for the span of feasible operating points

Authors:
Ashok Kumar Ponnuswami;Kumaresh Pattabiraman;Desmond Brand;Tapas Kanungo
Affiliations:
Microsoft Corporation, Redmond, WA, USA;Microsoft Corporation, Redmond, WA, USA;Microsoft Corporation, Redmond, WA, USA;Microsoft Corporation, Redmond, WA, USA
Venue:
Proceedings of the 20th international conference on World wide web
Year:
2011

Citing 17
Cited 4

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Results and challenges in Web search evaluation

WWW '99 Proceedings of the eighth international conference on World Wide Web
Modeling search engine effectiveness for federated search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
An experimental comparison of click position-bias models

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A dynamic bayesian network click model for web search ranking

Proceedings of the 18th international conference on World wide web
Sources of evidence for vertical selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Adaptation of offline vertical selection predictions in the presence of user feedback

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The impact of crawl policy on web search effectiveness

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Minimally invasive randomization for collecting unbiased preferences from clickthrough logs

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Reducing the risk of query expansion via robust constrained optimization

Proceedings of the 18th ACM conference on Information and knowledge management
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine

Proceedings of the third ACM international conference on Web search and data mining
Inferring search behaviors using partially observable Markov (POM) model

Proceedings of the third ACM international conference on Web search and data mining
User behavior driven ranking without editorial judgments

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
On composition of a federated web search result page: using online users to provide pairwise preference for heterogeneous verticals

Proceedings of the fourth ACM international conference on Web search and data mining

The effect of aggregated search coherence on search behavior

Proceedings of the 21st ACM international conference on Information and knowledge management
A unified search federation system based on online user feedback

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Predictive model performance: offline and online evaluations

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Factors affecting aggregated search coherence and search behavior

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern day federated search engines aggregate heterogeneous types of results from multiple vertical search engines and compose a single search engine result page (SERP). The search engine aggregates the results and produces one ranked list, constraining the vertical results to specific slots on the SERP. The usual way to compare two ranking algorithms is to first fix their operating points (internal thresholds), and then run an online experiment that lasts multiple weeks. Online user engagement metrics are then compared to decide which algorithm is better. However, this method does not characterize and compare the behavior over the entire span of operating points. Furthermore, this time-consuming approach is not practical if we have to conduct the experiment over numerous operating points. In this paper we propose a method of characterizing the performance of models that allows us to predict answers to "what if" questions about online user engagement using click-logs over the entire span of feasible operating points. We audition verticals at various slots on the SERP and generate click-logs. This log is then used to create operating curves between variables of interest (for example between result quality and click-through). The operating point for the system then can be chosen to achieve a specific trade-off between the variables. We apply this methodology to predict i) the online performance of two different models, ii) the impact of changing internal quality thresholds on clickthrough, iii) the behavior of introducing a new feature, iv) which machine learning loss function will give better online engagement, v) the impact of sampling distribution of head and tail queries in the training process. The results are reported on a well-known federated search engine. We validate the predictions with online experiments.