Sparse coding for feature selection on genome-wide association data

  • Authors:
  • Ingrid Brænne;Kai Labusch;Amir Madany Mamlouk

  • Affiliations:
  • Inst. for Neuro and Bioinf., Univ. of Lübeck, Lübeck, Germany and Medizinische Klinik II, Univ. of Lübeck, Lübeck, Germany and Graduate School for Computing in Medicine and Lif ...;Institute for Neuro and Bioinformatics, University of Lübeck, Lübeck, Germany;Institute for Neuro and Bioinf., Univ. of Lübeck, Lübeck, Germany and and University of Lübeck, Lübeck, Germany and Graduate School for Computing in Medicine and Life Sciences, ...

  • Venue:
  • ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables that increase the risk for a given phenotype. Univariate examinations have provided some insights, but it appears that most diseases are affected by interactions of multiple factors, which can only be identified through a multivariate analysis. However, multivariate analysis on the discrete, high-dimensional and low-sample-size GWA data is made more difficult by the presence of random effects and nonspecific coupling. In this work, we investigate the suitability of three standard techniques (p-values, SVM, PCA) for analyzing GWA data on several simulated datasets. We compare these standard techniques against a sparse coding approach; we demonstrate that sparse coding clearly outperforms the other approaches and can identify interacting factors in far higherdimensional datasets than the other three approaches.