Modified versions of Bayesian Information Criterion for genome-wide association studies

  • Authors:
  • Florian Frommlet;Felix Ruhaltinger;Piotr Twaróg;Małgorzata Bogdan

  • Affiliations:
  • Department of Medical Statistics, Medical University Vienna, Austria;Department of Medical Statistics, Medical University Vienna, Austria;Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland;Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.03

Visualization

Abstract

For the vast majority of genome-wide association studies (GWAS) statistical analysis was performed by testing markers individually. Elementary statistical considerations clearly show that in the case of complex traits an approach based on multiple regression or generalized linear models is preferable to testing single markers. A model selection approach to GWAS can be based on modifications of the Bayesian Information Criterion (BIC), where some search strategies are necessary to deal with a huge number of potential models. Comprehensive simulations based on real SNP data confirm that model selection has larger power to detect causal SNPs in complex models than single-marker tests. Furthermore, testing single markers leads to substantial problems with proper ranking of causal SNPs and tends to detect a certain number of false positive SNPs, which are not linked to any of the causal mutations. This behavior of single-marker tests is typical in GWAS for complex traits and can be explained by an aggregated influence of many small random sample correlations between genotypes of the SNP under investigation and other causal SNPs. These findings might at least partially explain problems with low power and nonreplicability of results in GWAS. A real data analysis illustrates advantages of model selection in practice, where publicly available gene expression data as traits for individuals from the HapMap project are reanalyzed.