Gene Selection Using Iterative Feature Elimination Random Forests for Survival Outcomes

Authors:
Herbert Pang;Stephen L. George;Ken Hui;Tiejun Tong
Affiliations:
Duke University, Durham;Duke University, Durham;Yale University, New Haven;Hong Kong Baptist University, Hong Kong
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 13
Cited 0

Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Random Forests

Machine Learning
On the exact distribution of maximally selected rank statistics

Computational Statistics & Data Analysis
Is cross-validation better than resubstitution for ranking genes?

Bioinformatics
Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Assessment of survival prediction models based on microarray data

Bioinformatics
Sparse kernel methods for high-dimensional survival data

Bioinformatics
Survival prediction using gene expression data: A review and comparison

Computational Statistics & Data Analysis
Laplacian Linear Discriminant Analysis Approach to Unsupervised Feature Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Pathway analysis using random forests with bivariate node-split for survival outcomes

Bioinformatics
Gene selection in microarray survival studies under possibly non-proportional hazards

Bioinformatics
Improving the Computational Efficiency of Recursive Cluster Elimination for Gene Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Recursive Mahalanobis Separability Measure for Gene Subset Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis. Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.