SNPboost: interaction analysis and risk prediction on GWA data

Authors:
Ingrid Brænne;Jeanette Erdmann;Amir Madany Mamlouk
Affiliations:
Institute for Neuro- and Bioinformatics and Medizinische Klinik II and Graduate School for Computing in Medicine and Life Sciences, University of Lübeck, Lübeck, Germany;Medizinische Klinik II and Graduate School for Computing in Medicine and Life Sciences, University of Lübeck, Lübeck, Germany;Institute for Neuro- and Bioinformatics and Graduate School for Computing in Medicine and Life Sciences, University of Lübeck, Lübeck, Germany
Venue:
ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Year:
2011

Citing 4
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
An introduction to variable and feature selection

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Genome-wide association (GWA) studies, which typically aim to identify single nucleotide polymorphisms (SNPs) associated with a disease, yield large amounts of high-dimensional data. GWA studies have been successful in identifying single SNPs associated with complex diseases. However, so far, most of the identified associations do only have a limited impact on risk prediction. Recent studies applying SVMs have been successful in improving the risk prediction for Type I and II diabetes, however, a drawback is the poor interpretability of the classifier. Training the SVM only on a subset of SNPs would imply a preselection, typically by the p-values. Especially for complex diseases, this might not be the optimal selection strategy. In this work, we propose an extension of Adaboost for GWA data, the so-called SNPboost. In order to improve classification, SNPboost successively selects a subset of SNPs. On real GWA data (German MI family study II), SNPboost outperformed linear SVM and further improved the performance of a non-linear SVM when used as a preselector. Finally, we motivate that the selected SNPs can be put into a biological context.