Application of a genetic algorithm — support vector machine hybrid for prediction of clinical phenotypes based on genome-wide SNP profiles of sib pairs

  • Authors:
  • Binsheng Gong;Zheng Guo;Jing Li;Guohua Zhu;Sali Lv;Shaoqi Rao;Xia Li

  • Affiliations:
  • Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;,Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH;Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;,Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;,Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China

  • Venue:
  • FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale genome-wide genetic profiling using markers of single nucleotide polymorphisms (SNPs) has offered the opportunities to investigate the possibility of using those biomarkers for predicting genetic risks. Because of the special data structure characterized with a high dimension, signal-to-noise ratio and correlations between genes, but with a relative small sample size, the data analysis needs special strategies. We propose a robust data reduction technique based on a hybrid between genetic algorithm and support vector machine. The major goal of this hybridization is to fully exploit their respective merits (e.g., robustness to the size of solution space and capability of handling a very large dimension of features) for identification of key SNP features for risk prediction. We have applied the approach to the Genetic Analysis Workshop 14 COGA data to predict affection status of a sib pair based on genome-wide SNP identical-by-decent (IBD) informatics. This application has demonstrated its potential to extract useful information from the massive SNP data.