Parallel boosted regression trees for web search ranking

Authors:
Stephen Tyree;Kilian Q. Weinberger;Kunal Agrawal;Jennifer Paykin
Affiliations:
Washington University in St. Louis, St. Louis, MO, USA;Washington University in St. Louis, St. Louis, MO, USA;Washington University in St. Louis, St. Louis, MO, USA;Wesleyan University, Middletown, CT, USA
Venue:
Proceedings of the 20th international conference on World wide web
Year:
2011

Citing 22
Cited 14

Bagging predictors

Machine Learning
MultiBoosting: A Technique for Combining Boosting and Wagging

Machine Learning
Mining Very Large Databases with Parallel Processing

Mining Very Large Databases with Parallel Processing
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
Random Forests

Machine Learning
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Boosting Algorithms for Parallel and Distributed Learning

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Parallel Formulations of Decision-Tree Classification Algorithms

Data Mining and Knowledge Discovery
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets

Data Mining and Knowledge Discovery
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Parallel Induction Algorithms for Data Mining

IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning to rank: from pairwise approach to listwise approach

Proceedings of the 24th international conference on Machine learning
A support vector method for optimizing average precision

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A New Framework for Distributed Boosting Algorithm

FGCN '07 Proceedings of the Future Generation Communication and Networking - Volume 01
SoftRank: optimizing non-smooth rank metrics

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Stochastic gradient boosted distributed decision trees

Proceedings of the 18th ACM conference on Information and knowledge management
PLANET: massively parallel learning of tree ensembles with MapReduce

Proceedings of the VLDB Endowment
A Streaming Parallel Decision Tree Algorithm

The Journal of Machine Learning Research
Gradient descent optimization of smoothed information retrieval metrics

Information Retrieval

Malware characteristics and threats on the internet ecosystem

Journal of Systems and Software
Intelligible models for classification and regression

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A novel local patch framework for fixing supervised learning models

Proceedings of the 21st ACM international conference on Information and knowledge management
On the usefulness of query features for learning to rank

Proceedings of the 21st ACM international conference on Information and knowledge management
HC-CART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multi-FPGA system

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Improving on-demand learning to rank through parallelism

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Training efficient tree-based models for document ranking

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Search engine switching detection based on user personal preferences and behavior patterns

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Learning to combine representations for medical records search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Direct optimization of ranking measures for learning to rank models

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to shorten query sessions

Proceedings of the 22nd international conference on World Wide Web companion
About learning models with multiple query-dependent features

ACM Transactions on Information Systems (TOIS)
Learning to handle negated language in medical records search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Learning to selectively rank patients' medical history

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned web-search ranking - a domain notorious for very large data sets. In this paper, we propose a novel method for parallelizing the training of GBRT. Our technique parallelizes the construction of the individual regression trees and operates using the master-worker paradigm as follows. The data are partitioned among the workers. At each iteration, the worker summarizes its data-partition using histograms. The master processor uses these to build one layer of a regression tree, and then sends this layer to the workers, allowing the workers to build histograms for the next layer. Our algorithm carefully orchestrates overlap between communication and computation to achieve good performance. Since this approach is based on data partitioning, and requires a small amount of communication, it generalizes to distributed and shared memory machines, as well as clouds. We present experimental results on both shared memory machines and clusters for two large scale web search ranking data sets. We demonstrate that the loss in accuracy induced due to the histogram approximation in the regression tree creation can be compensated for through slightly deeper trees. As a result, we see no significant loss in accuracy on the Yahoo data sets and a very small reduction in accuracy for the Microsoft LETOR data. In addition, on shared memory machines, we obtain almost perfect linear speed-up with up to about 48 cores on the large data sets. On distributed memory machines, we get a speedup of 25 with 32 processors. Due to data partitioning our approach can scale to even larger data sets, on which one can reasonably expect even higher speedups.