Initialization parameter sweep in ATHENA: optimizing neural networks for detecting gene-gene interactions in the presence of small main effects

  • Authors:
  • Emily Rose Holzinger;Carrie C. Buchanan;Scott M. Dudek;Eric C. Torstenson;Stephen D. Turner;Marylyn D. Ritchie

  • Affiliations:
  • Vanderbilt University Medical Center, Nashville, TN, USA;Vanderbilt University Medical Center, Nashville, TN, USA;Vanderbilt University Medical Center, Nashville, TN, USA;Vanderbilt University Medical Center, Nashville, TN, USA;Vanderbilt University Medical Center, Nashville, TN, USA;Vanderbilt University Medical Center, Nashville, TN, USA

  • Venue:
  • Proceedings of the 12th annual conference on Genetic and evolutionary computation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent advances in genotyping technology have led to the generation of an enormous quantity of genetic data. Traditional methods of statistical analysis have proved insufficient in extracting all of the information about the genetic components of common, complex human diseases.  A contributing factor to the problem of analysis is that amongst the small main effects of each single gene on disease susceptibility, there are non-linear, gene-gene interactions that can be difficult for traditional, parametric analyses to detect. In addition, exhaustively searching all multi-locus combinations has proved computationally impractical. Novel strategies for analysis have been developed to address these issues. The Analysis Tool for Heritable and Environmental Network Associations (ATHENA) is an analytical tool that incorporates grammatical evolution neural networks (GENN) to detect interactions among genetic factors. Initial parameters define how the evolutionary process will be implemented. This research addresses how different parameter settings affect detection of disease models involving interactions. In the current study, we iterate over multiple parameter values to determine which combinations appear optimal for detecting interactions in simulated data for multiple genetic models. Our results indicate that the factors that have the greatest influence on detection are: input variable encoding, population size, and parallel computation.