On safari to Random Jungle

Authors:
Daniel F. Schwarz;Inke R. König;Andreas Ziegler
Affiliations:
-;-;-
Venue:
Bioinformatics
Year:
2010

Citing 0
Cited 5

A parallel random forest classifier for R

Proceedings of the second international workshop on Emerging computational methods for the life sciences
Comparison of methods for meta-dimensional data analysis using in silico and biological data sets

EvoBIO'12 Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Stratified sampling for feature subspace selection in random forests for high dimensional data

Pattern Recognition
Distinguishing between genomic regions bound by paralogous transcription factors

RECOMB'13 Proceedings of the 17th international conference on Research in Computational Molecular Biology

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Genome-wide association (GWA) studies have proven to be a successful approach for helping unravel the genetic basis of complex genetic diseases. However, the identified associations are not well suited for disease prediction, and only a modest portion of the heritability can be explained for most diseases, such as Type 2 diabetes or Crohn's disease. This may partly be due to the low power of standard statistical approaches to detect gene–gene and gene–environment interactions when small marginal effects are present. A promising alternative is Random Forests, which have already been successfully applied in candidate gene analyses. Important single nucleotide polymorphisms are detected by permutation importance measures. To this day, the application to GWA data was highly cumbersome with existing implementations because of the high computational burden. Results: Here, we present the new freely available software package Random Jungle (RJ), which facilitates the rapid analysis of GWA data. The program yields valid results and computes up to 159 times faster than the fastest alternative implementation, while still maintaining all options of other programs. Specifically, it offers the different permutation importance measures available. It includes new options such as the backward elimination method. We illustrate the application of RJ to a GWA of Crohn's disease. The most important single nucleotide polymorphisms (SNPs) validate recent findings in the literature and reveal potential interactions. Availability: The RJ software package is freely available at http://www.randomjungle.org Contact:inke.koenig@imbs.uni-luebeck.de; ziegler@imbs.uni-luebeck.de Supplementary information:Supplementary data are available at Bioinformatics online.