An optimum random forest model for prediction of genetic susceptibility to complex diseases

Authors:
Weidong Mao;Shannon Kelly
Affiliations:
Department of Computer Science, Shippensburg University, Shippensburg, PA;Department of Computer Science, Shippensburg University, Shippensburg, PA
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 3
Cited 0

Predicting cancer susceptibility from single-nucleotide polymorphism data: a case study in multiple myeloma

Proceedings of the 5th international workshop on Bioinformatics
2SNP: scalable phasing based on 2-SNP haplotypes

Bioinformatics
Tag SNP selection based on multivariate linear regression

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-throughput single nucleotide polymorphism (SNP) genotyping technologies make massive genotype data, with a large number of individuals, publicly available. Accessibility of genetic data makes genome-wide association studies for complex diseases possible. One of the most challenging issues in genome-wide association studies is to search and analyze genetic risk factors resulting from interactions of multiple genes. The integrated risk factor usually have a higher risk rate than single SNPs. This paper explores the possibility of applying random forest to search disease-associated factors for given case/control samples. An optimum random forest based algorithm is proposed for the disease susceptibility prediction problem. The proposed method has been applied to publicly available genotype data on Crohn's disease and autoimmune disorders for predicting susceptibility to these diseases. The achieved accuracy of prediction is higher than those achieved by universal prediction methods such as Support Vector Machine (SVM) and previous known methods.