SNPboost: interaction analysis and risk prediction on GWA data

  • Authors:
  • Ingrid Brænne;Jeanette Erdmann;Amir Madany Mamlouk

  • Affiliations:
  • Institute for Neuro- and Bioinformatics and Medizinische Klinik II and Graduate School for Computing in Medicine and Life Sciences, University of Lübeck, Lübeck, Germany;Medizinische Klinik II and Graduate School for Computing in Medicine and Life Sciences, University of Lübeck, Lübeck, Germany;Institute for Neuro- and Bioinformatics and Graduate School for Computing in Medicine and Life Sciences, University of Lübeck, Lübeck, Germany

  • Venue:
  • ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Genome-wide association (GWA) studies, which typically aim to identify single nucleotide polymorphisms (SNPs) associated with a disease, yield large amounts of high-dimensional data. GWA studies have been successful in identifying single SNPs associated with complex diseases. However, so far, most of the identified associations do only have a limited impact on risk prediction. Recent studies applying SVMs have been successful in improving the risk prediction for Type I and II diabetes, however, a drawback is the poor interpretability of the classifier. Training the SVM only on a subset of SNPs would imply a preselection, typically by the p-values. Especially for complex diseases, this might not be the optimal selection strategy. In this work, we propose an extension of Adaboost for GWA data, the so-called SNPboost. In order to improve classification, SNPboost successively selects a subset of SNPs. On real GWA data (German MI family study II), SNPboost outperformed linear SVM and further improved the performance of a non-linear SVM when used as a preselector. Finally, we motivate that the selected SNPs can be put into a biological context.