A parallel random forest classifier for R

Authors:
Lawrence Mitchell;Terence M. Sloan;Muriel Mewissen;Peter Ghazal;Thorsten Forster;Michal Piotrowski;Arthur S. Trew
Affiliations:
University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom
Venue:
Proceedings of the second international workshop on Emerging computational methods for the life sciences
Year:
2011

Citing 5
Cited 1

Random Forests

Machine Learning
Induction of Decision Trees

Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
On safari to Random Jungle

Bioinformatics

Editorial: Modifications of the construction and voting mechanisms of the Random Forests Algorithm

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The statistical language R is favoured by many biostaticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming, or even not possible at all with the existing software infrastructure. High Performance Computing (HPC) systems offer a solution to these problems, but at the expense of increased complexity for the end user. The Simple Parallel R Interface (SPRINT) is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop-in parallelized replacements of existing R functions. In this paper we describe the implementation of a parallel version of the Random Forest classifier in the SPRINT library.