Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Results and challenges in Web search evaluation
WWW '99 Proceedings of the eighth international conference on World Wide Web
Modeling search engine effectiveness for federated search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query chains: learning to rank from implicit feedback
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
An experimental comparison of click position-bias models
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A dynamic bayesian network click model for web search ranking
Proceedings of the 18th international conference on World wide web
Sources of evidence for vertical selection
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Adaptation of offline vertical selection predictions in the presence of user feedback
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The impact of crawl policy on web search effectiveness
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Minimally invasive randomization for collecting unbiased preferences from clickthrough logs
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Reducing the risk of query expansion via robust constrained optimization
Proceedings of the 18th ACM conference on Information and knowledge management
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine
Proceedings of the third ACM international conference on Web search and data mining
Inferring search behaviors using partially observable Markov (POM) model
Proceedings of the third ACM international conference on Web search and data mining
User behavior driven ranking without editorial judgments
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Proceedings of the fourth ACM international conference on Web search and data mining
The effect of aggregated search coherence on search behavior
Proceedings of the 21st ACM international conference on Information and knowledge management
A unified search federation system based on online user feedback
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Predictive model performance: offline and online evaluations
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Factors affecting aggregated search coherence and search behavior
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Modern day federated search engines aggregate heterogeneous types of results from multiple vertical search engines and compose a single search engine result page (SERP). The search engine aggregates the results and produces one ranked list, constraining the vertical results to specific slots on the SERP. The usual way to compare two ranking algorithms is to first fix their operating points (internal thresholds), and then run an online experiment that lasts multiple weeks. Online user engagement metrics are then compared to decide which algorithm is better. However, this method does not characterize and compare the behavior over the entire span of operating points. Furthermore, this time-consuming approach is not practical if we have to conduct the experiment over numerous operating points. In this paper we propose a method of characterizing the performance of models that allows us to predict answers to "what if" questions about online user engagement using click-logs over the entire span of feasible operating points. We audition verticals at various slots on the SERP and generate click-logs. This log is then used to create operating curves between variables of interest (for example between result quality and click-through). The operating point for the system then can be chosen to achieve a specific trade-off between the variables. We apply this methodology to predict i) the online performance of two different models, ii) the impact of changing internal quality thresholds on clickthrough, iii) the behavior of introducing a new feature, iv) which machine learning loss function will give better online engagement, v) the impact of sampling distribution of head and tail queries in the training process. The results are reported on a well-known federated search engine. We validate the predictions with online experiments.