Application of a genetic algorithm — support vector machine hybrid for prediction of clinical phenotypes based on genome-wide SNP profiles of sib pairs

Authors:
Binsheng Gong;Zheng Guo;Jing Li;Guohua Zhu;Sali Lv;Shaoqi Rao;Xia Li
Affiliations:
Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;,Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH;Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;,Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China;,Department of Bioinformatics, Harbin Medical University, Harbin, P.R. China
Venue:
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Year:
2005

Citing 3
Cited 0

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Combinatorial Auctions, Knapsack Problems, and Hill-Climbing Search

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Empirical investigation of the benefits of partial lamarckianism

Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale genome-wide genetic profiling using markers of single nucleotide polymorphisms (SNPs) has offered the opportunities to investigate the possibility of using those biomarkers for predicting genetic risks. Because of the special data structure characterized with a high dimension, signal-to-noise ratio and correlations between genes, but with a relative small sample size, the data analysis needs special strategies. We propose a robust data reduction technique based on a hybrid between genetic algorithm and support vector machine. The major goal of this hybridization is to fully exploit their respective merits (e.g., robustness to the size of solution space and capability of handling a very large dimension of features) for identification of key SNP features for risk prediction. We have applied the approach to the Genetic Analysis Workshop 14 COGA data to predict affection status of a sib pair based on genome-wide SNP identical-by-decent (IBD) informatics. This application has demonstrated its potential to extract useful information from the massive SNP data.