Efficiently identifying significant associations in genome-wide association studies

  • Authors:
  • Emrah Kostem;Eleazar Eskin

  • Affiliations:
  • Computer Science Department, University of California, Los Angeles, California;Computer Science Department, University of California, Los Angeles, California

  • Venue:
  • RECOMB'13 Proceedings of the 17th international conference on Research in Computational Molecular Biology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Over the past several years, genome wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome which harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits where only a handful of phenotypes are analyzed per study, in (eQTL) studies, tens of thousands of gene expression levels are measured and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed-models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the SNPs. In the first-stage a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions which may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to state of the art testing approaches by a factor of 75.