Bagging gradient-boosted trees for high precision, low variance ranking models

Authors:
Yasser Ganjisaffar;Rich Caruana;Cristina Videira Lopes
Affiliations:
University of California, Irvine, Irvine, CA, USA;Microsoft Research, Redmond, WA, USA;University of California, Irvine, Irvine, CA, USA
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 23
Cited 21

Neural networks and the bias/variance dilemma

Neural Computation
Decision Combination in Multiple Classifier Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bagging predictors

Machine Learning
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
MultiBoosting: A Technique for Combining Boosting and Wagging

Machine Learning
Using Iterated Bagging to Debias Regressions

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Random Forests

Machine Learning
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to rank: from pairwise approach to listwise approach

Proceedings of the 24th international conference on Machine learning
A support vector method for optimizing average precision

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
FRank: a ranking method with fidelity loss

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
AdaRank: a boosting algorithm for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Additive Groves of Regression Trees

ECML '07 Proceedings of the 18th European conference on Machine Learning
BoltzRank: learning to maximize expected ranking gain

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Gradient descent optimization of smoothed information retrieval metrics

Information Retrieval
Combined regression and ranking

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
LETOR: A benchmark collection for research on learning to rank for information retrieval

Information Retrieval
BagBoo: a scalable hybrid bagging-the-boosting model

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Combining bias and variance reduction techniques for regression trees

ECML'05 Proceedings of the 16th European conference on Machine Learning

Effect of dynamic pruning safety on learning to rank effectiveness

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Query-biased learning to rank for real-time twitter search

Proceedings of the 21st ACM international conference on Information and knowledge management
Fast candidate generation for two-phase document ranking: postings list intersection with bloom filters

Proceedings of the 21st ACM international conference on Information and knowledge management
On the usefulness of query features for learning to rank

Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient and effective retrieval using selective pruning

Proceedings of the sixth ACM international conference on Web search and data mining
A survey of learning to rank for real-time twitter search

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Training efficient tree-based models for document ranking

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Hybrid query scheduling for a replicated search engine

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Fast data in the era of big data: Twitter's real-time related query suggestion architecture

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Learning to combine representations for medical records search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
About learning models with multiple query-dependent features

ACM Transactions on Information Systems (TOIS)
Fast candidate generation for real-time tweet search with bloom filter chains

ACM Transactions on Information Systems (TOIS)
A Modification of LambdaMART to Handle Noisy Crowdsourced Assessments

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Clustering-based transduction for learning a ranking model with limited human labels

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Learning to handle negated language in medical records search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Learning to selectively rank patients' medical history

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Learning to rank query suggestions for adhoc and diversity search

Information Retrieval
Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013

Proceedings of the 2013 KDD Cup 2013 Workshop
The whens and hows of learning to rank for web search

Information Retrieval
Document vector representations for feature extraction in multi-stage document ranking

Information Retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recent studies have shown that boosting provides excellent predictive performance across a wide variety of tasks. In Learning-to-rank, boosted models such as RankBoost and LambdaMART have been shown to be among the best performing learning methods based on evaluations on public data sets. In this paper, we show how the combination of bagging as a variance reduction technique and boosting as a bias reduction technique can result in very high precision and low variance ranking models. We perform thousands of parameter tuning experiments for LambdaMART to achieve a high precision boosting model. Then we show that a bagged ensemble of such LambdaMART boosted models results in higher accuracy ranking models while also reducing variance as much as 50%. We report our results on three public learning-to-rank data sets using four metrics. Bagged LamdbaMART outperforms all previously reported results on ten of the twelve comparisons, and bagged LambdaMART outperforms non-bagged LambdaMART on all twelve comparisons. For example, wrapping bagging around LambdaMART increases NDCG@1 from 0.4137 to 0.4200 on the MQ2007 data set; the best prior results in the literature for this data set is 0.4134 by RankBoost.