Incorporating robustness into web ranking evaluation

Authors:
Xin Li;Fan Li;Shihao Ji;Zhaohui Zheng;Yi Chang;Anlei Dong
Affiliations:
Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 8
Cited 2

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Ranking robustness: a novel framework to predict query performance

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A regression framework for learning ranking functions using relative relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms and incentives for robust ranking

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
SoftRank: optimizing non-smooth rank metrics

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A few bad votes too many?: towards robust ranking in social media

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web

Upper-bound approximations for dynamic pruning

ACM Transactions on Information Systems (TOIS)
Robust ordinal regression in preference learning and ranking

Machine Learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

In many Web search engines, a ranking function is selected for deployment mainly by comparing the relevance measurements over candidates. Due to the dynamical nature of the Web, the ranking features and the query and URL distribution on which the ranking functions are built, may change dramatically over time. The actual relevance of the function may degrade, and thus the previous function selection conclusions become invalid. In this work we suggest to select Web ranking functions according to both their relevance and robustness to the changes that may lead to relevance degradation over time. We argue that the ranking robustness can be effectively measured by taking into account the ranking score distribution across search results. We then propose two alternatives to the NDCG metric that both incorporate ranking robustness into ranking function evaluation and selection. A machine learning approach is developed to learn the parameters that control the metric sensitivity to score turbulence, from human-judged preference data.