Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of pooled and sampled relevance judgments
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Active exploration for learning rankings from clickthrough data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An experimental comparison of click position-bias models
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A user browsing model to predict search engine click data from past observations.
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
Proceedings of the Second ACM International Conference on Web Search and Data Mining
A dynamic bayesian network click model for web search ranking
Proceedings of the 18th international conference on World wide web
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Estimating interleaved comparison outcomes from historical click data
Proceedings of the 21st ACM international conference on Information and knowledge management
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We provide a novel method of evaluating search results, which allows us to combine existing editorial judgments with the relevance estimates generated by click-based user browsing models. There are evaluation methods in the literature that use clicks and editorial judgments together, but our approach is novel in the sense that it allows us to predict the impact of unseen search models without online tests to collect clicks and without requesting new editorial data, since we are only re-using existing editorial data, and clicks observed for previous result set configurations. Since the user browsing model and the pre-existing editorial data cannot provide relevance estimates for all documents for the selected set of queries, one important challenge is to obtain this performance estimation where there are a lot of ranked documents with missing relevance values. We introduce a query and rank based smoothing to overcome this problem. We show that a hybrid of these smoothing techniques performs better than both query and position based smoothing, and despite the high percentage of missing judgments, the resulting method is significantly correlated (0.74) with DCG values evaluated using fully judged datasets, and approaches inter-annotator agreement. We show that previously published techniques, applicable to frequent queries, degrade when applied to a random sample of queries, with a correlation of only 0.29. While our experiments focus on evaluation using DCG, our method is also applicable to other commonly used metrics.