The whens and hows of learning to rank for web search

Authors:
Craig Macdonald;Rodrygo L. Santos;Iadh Ounis
Affiliations:
School of Computing Science, University of Glasgow, Scotland, UK;School of Computing Science, University of Glasgow, Scotland, UK;School of Computing Science, University of Glasgow, Scotland, UK
Venue:
Information Retrieval
Year:
2013

Citing 35
Cited 1

How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Usefulness of hyperlink structure for query-biased topic distillation

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Toward better weighting of anchors

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Fusion of effective retrieval strategies in the same information retrieval system

Journal of the American Society for Information Science and Technology
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
A reference collection for web spam

ACM SIGIR Forum
AdaRank: a boosting algorithm for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Incorporating term dependency in the dfr framework

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic feature selection in the markov random field model for information retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Retrieval sensitivity under training using different measures

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 2009 workshop on Web Search Click Data

Second ACM International Conference on Web Search and Web Data Mining
Usefulness of quality click-through data for training

Proceedings of the 2009 workshop on Web Search Click Data
Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
On the local optimality of LambdaRank

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Document selection methodologies for efficient and effective learning-to-rank

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Where to stop reading a ranked list?: threshold optimization using truncated score distributions

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Early exit optimizations for additive machine learned ranking systems

Proceedings of the third ACM international conference on Web search and data mining
On the choice of effectiveness measures for learning to rank

Information Retrieval
Active learning for ranking through expected loss optimization

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
LETOR: A benchmark collection for research on learning to rank for information retrieval

Information Retrieval
Bagging gradient-boosted trees for high precision, low variance ranking models

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
The static absorbing model for the web

Journal of Web Engineering
Learning to Rank for Information Retrieval and Natural Language Processing

Learning to Rank for Information Retrieval and Natural Language Processing
Efficient and effective spam filtering and re-ranking for large web datasets

Information Retrieval

Evaluation as a service for information retrieval

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web search engines are increasingly deploying many features, combined using learning to rank techniques. However, various practical questions remain concerning the manner in which learning to rank should be deployed. For instance, a sample of documents with sufficient recall is used, such that re-ranking of the sample by the learned model brings the relevant documents to the top. However, the properties of the document sample such as when to stop ranking--i.e. its minimum effective size--remain unstudied. Similarly, effective listwise learning to rank techniques minimise a loss function corresponding to a standard information retrieval evaluation measure. However, the appropriate choice of how to calculate the loss function--i.e. the choice of the learning evaluation measure and the rank depth at which this measure should be calculated--are as yet unclear. In this paper, we address all of these issues by formulating various hypotheses and research questions, before performing exhaustive experiments using multiple learning to rank techniques and different types of information needs on the ClueWeb09 and LETOR corpora. Among many conclusions, we find, for instance, that the smallest effective sample for a given query set is dependent on the type of information need of the queries, the document representation used during sampling and the test evaluation measure. As the sample size is varied, the selected features markedly change--for instance, we find that the link analysis features are favoured for smaller document samples. Moreover, despite reflecting a more realistic user model, the recently proposed ERR measure is not as effective as the traditional NDCG as a learning loss function. Overall, our comprehensive experiments provide the first empirical derivation of best practices for learning to rank deployments.