How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
Usefulness of hyperlink structure for query-biased topic distillation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Toward better weighting of anchors
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Fusion of effective retrieval strategies in the same information retrieval system
Journal of the American Society for Information Science and Technology
Relevance weighting for query independent evidence
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
A reference collection for web spam
ACM SIGIR Forum
AdaRank: a boosting algorithm for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Incorporating term dependency in the dfr framework
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic feature selection in the markov random field model for information retrieval
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Retrieval sensitivity under training using different measures
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 2009 workshop on Web Search Click Data
Second ACM International Conference on Web Search and Web Data Mining
Usefulness of quality click-through data for training
Proceedings of the 2009 workshop on Web Search Click Data
Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
On the local optimality of LambdaRank
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Document selection methodologies for efficient and effective learning-to-rank
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Where to stop reading a ranked list?: threshold optimization using truncated score distributions
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning to Rank for Information Retrieval
Foundations and Trends in Information Retrieval
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Early exit optimizations for additive machine learned ranking systems
Proceedings of the third ACM international conference on Web search and data mining
On the choice of effectiveness measures for learning to rank
Information Retrieval
Active learning for ranking through expected loss optimization
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
LETOR: A benchmark collection for research on learning to rank for information retrieval
Information Retrieval
Bagging gradient-boosted trees for high precision, low variance ranking models
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
The static absorbing model for the web
Journal of Web Engineering
Learning to Rank for Information Retrieval and Natural Language Processing
Learning to Rank for Information Retrieval and Natural Language Processing
Efficient and effective spam filtering and re-ranking for large web datasets
Information Retrieval
Evaluation as a service for information retrieval
ACM SIGIR Forum
Hi-index | 0.00 |
Web search engines are increasingly deploying many features, combined using learning to rank techniques. However, various practical questions remain concerning the manner in which learning to rank should be deployed. For instance, a sample of documents with sufficient recall is used, such that re-ranking of the sample by the learned model brings the relevant documents to the top. However, the properties of the document sample such as when to stop ranking--i.e. its minimum effective size--remain unstudied. Similarly, effective listwise learning to rank techniques minimise a loss function corresponding to a standard information retrieval evaluation measure. However, the appropriate choice of how to calculate the loss function--i.e. the choice of the learning evaluation measure and the rank depth at which this measure should be calculated--are as yet unclear. In this paper, we address all of these issues by formulating various hypotheses and research questions, before performing exhaustive experiments using multiple learning to rank techniques and different types of information needs on the ClueWeb09 and LETOR corpora. Among many conclusions, we find, for instance, that the smallest effective sample for a given query set is dependent on the type of information need of the queries, the document representation used during sampling and the test evaluation measure. As the sample size is varied, the selected features markedly change--for instance, we find that the link analysis features are favoured for smaller document samples. Moreover, despite reflecting a more realistic user model, the recently proposed ERR measure is not as effective as the traditional NDCG as a learning loss function. Overall, our comprehensive experiments provide the first empirical derivation of best practices for learning to rank deployments.