Pathway analysis using random forests classification and regression

Authors:
Herbert Pang;Aiping Lin;Matthew Holford;Bradley E. Enerson;Bin Lu;Michael P. Lawton;Eugenia Floyd;Hongyu Zhao
Affiliations:
Division of Biostatistics, Department of Epidemiology and Public Health, Yale University School of Medicine New Haven, CT 06520, USA;W. M. Keck Biotechnology Resource Laboratory, Yale University School of Medicine New Haven, CT 06520, USA;Division of Biostatistics, Department of Epidemiology and Public Health, Yale University School of Medicine New Haven, CT 06520, USA;Boyer Center for Molecular Medicine, Yale University School of Medicine New Haven, CT 06520, USA;Pfizer Groton Laboratories, Safety Sciences Groton, CT 06340, USA;Pfizer Groton Laboratories, Safety Sciences Groton, CT 06340, USA;Pfizer Groton Laboratories, Safety Sciences Groton, CT 06340, USA;Division of Biostatistics, Department of Epidemiology and Public Health, Yale University School of Medicine New Haven, CT 06520, USA
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 10

Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis

Journal of Biomedical Informatics
Biological pathways as features for microarray data classification

Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
Formulating and testing hypotheses in functional genomics

Artificial Intelligence in Medicine
Identification of N-Glycosylation Sites with Sequence and Structural Features Employing Random Forests

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Identification of Defensins Employing Recurrence Quantification Analysis and Random Forest Classifiers

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Mining patterns in disease classification forests

Journal of Biomedical Informatics
Genome-wide DNA-binding specificity of PIL5, an Arabidopsis basic Helix-Loop-Helix (bHLH) transcription factor

International Journal of Data Mining and Bioinformatics
GSGS: A Computational Approach to Reconstruct Signaling Pathway Structures from Gene Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining
A novel ensemble of classifiers that use biological relevant gene sets for microarray classification

Applied Soft Computing

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers. Results: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data. Availability: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm Contact: hongyu.zhao@yale.edu Supplementary Information: Supplementary Data are available at http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm