Feature selection method using WF-LASSO for gene expression data analysis
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Modified versions of Bayesian Information Criterion for genome-wide association studies
Computational Statistics & Data Analysis
Towards applying associative classifier for genetic variants
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Minimizing time when applying bootstrap to contingency tables analysis of genome-wide data
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Parallel feature selection for regularized least-squares
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Hi-index | 3.84 |
Motivation: Genome-wide association studies (GWAS) involving half a million or more single nucleotide polymorphisms (SNPs) allow genetic dissection of complex diseases in a holistic manner. The common practice of analyzing one SNP at a time does not fully realize the potential of GWAS to identify multiple causal variants and to predict risk of disease. Existing methods for joint analysis of GWAS data tend to miss causal SNPs that are marginally uncorrelated with disease and have high false discovery rates (FDRs). Results: We introduce GWASelect, a statistically powerful and computationally efficient variable selection method designed to tackle the unique challenges of GWAS data. This method searches iteratively over the potential SNPs conditional on previously selected SNPs and is thus capable of capturing causal SNPs that are marginally correlated with disease as well as those that are marginally uncorrelated with disease. A special resampling mechanism is built into the method to reduce false positive findings. Simulation studies demonstrate that the GWASelect performs well under a wide spectrum of linkage disequilibrium patterns and can be substantially more powerful than existing methods in capturing causal variants while having a lower FDR. In addition, the regression models based on the GWASelect tend to yield more accurate prediction of disease risk than existing methods. The advantages of the GWASelect are illustrated with the Wellcome Trust Case-Control Consortium (WTCCC) data. Availability: The software implementing GWASelect is available at http://www.bios.unc.edu/~lin. Access to WTCCC data: http://www.wtccc.org.uk/ Contact: lin@bios.unc.edu Supplementary information:Supplementary data are available at Bioinformatics Online.