Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Ranking robustness: a novel framework to predict query performance
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A regression framework for learning ranking functions using relative relevance judgments
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms and incentives for robust ranking
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
SoftRank: optimizing non-smooth rank metrics
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A few bad votes too many?: towards robust ranking in social media
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Upper-bound approximations for dynamic pruning
ACM Transactions on Information Systems (TOIS)
Robust ordinal regression in preference learning and ranking
Machine Learning
Hi-index | 0.01 |
In many Web search engines, a ranking function is selected for deployment mainly by comparing the relevance measurements over candidates. Due to the dynamical nature of the Web, the ranking features and the query and URL distribution on which the ranking functions are built, may change dramatically over time. The actual relevance of the function may degrade, and thus the previous function selection conclusions become invalid. In this work we suggest to select Web ranking functions according to both their relevance and robustness to the changes that may lead to relevance degradation over time. We argue that the ranking robustness can be effectively measured by taking into account the ranking score distribution across search results. We then propose two alternatives to the NDCG metric that both incorporate ranking robustness into ranking function evaluation and selection. A machine learning approach is developed to learn the parameters that control the metric sensitivity to score turbulence, from human-judged preference data.