Sparse coding for feature selection on genome-wide association data

Authors:
Ingrid Brænne;Kai Labusch;Amir Madany Mamlouk
Affiliations:
Inst. for Neuro and Bioinf., Univ. of Lübeck, Lübeck, Germany and Medizinische Klinik II, Univ. of Lübeck, Lübeck, Germany and Graduate School for Computing in Medicine and Lif ...;Institute for Neuro and Bioinformatics, University of Lübeck, Lübeck, Germany;Institute for Neuro and Bioinf., Univ. of Lübeck, Lübeck, Germany and and University of Lübeck, Lübeck, Germany and Graduate School for Computing in Medicine and Life Sciences, ...
Venue:
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
Year:
2010

Citing 2
Cited 0

Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables that increase the risk for a given phenotype. Univariate examinations have provided some insights, but it appears that most diseases are affected by interactions of multiple factors, which can only be identified through a multivariate analysis. However, multivariate analysis on the discrete, high-dimensional and low-sample-size GWA data is made more difficult by the presence of random effects and nonspecific coupling. In this work, we investigate the suitability of three standard techniques (p-values, SVM, PCA) for analyzing GWA data on several simulated datasets. We compare these standard techniques against a sparse coding approach; we demonstrate that sparse coding clearly outperforms the other approaches and can identify interacting factors in far higherdimensional datasets than the other three approaches.