Enriched random forests

Authors:
Dhammika Amaratunga;Javier Cabrera;Yung-Seop Lee
Affiliations:
-;-;-
Venue:
Bioinformatics
Year:
2008

Citing 0
Cited 5

Analysis of a random forests model

The Journal of Machine Learning Research
Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Stratified sampling for feature subspace selection in random forests for high dimensional data

Pattern Recognition
Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining
THE NEW HYBRID METHOD FOR CLASSIFICATION OF PATIENTS BY GENE EXPRESSION PROFILING

Journal of Integrated Design & Process Science

Quantified Score

Hi-index	3.84

Visualization

Abstract

Although the random forest classification procedure works well in datasets with many features, when the number of features is huge and the percentage of truly informative features is small, such as with DNA microarray data, its performance tends to decline significantly. In such instances, the procedure can be improved by reducing the contribution of trees whose nodes are populated by non-informative features. To some extent, this can be achieved by prefiltering, but we propose a novel, yet simple, adjustment that has demonstrably superior performance: choose the eligible subsets at each node by weighted random sampling instead of simple random sampling, with the weights tilted in favor of the informative features. This results in an ‘enriched random forest’. We illustrate the superior performance of this procedure in several actual microarray datasets. Contact: damaratu@prdus.jnj.com