A model free method to generate human genetics datasets with complex gene-disease relationships

Authors:
Casey S. Greene;Daniel S. Himmelstein;Jason H. Moore
Affiliations:
Dartmouth Medical School, Lebanon, NH;Dartmouth Medical School, Lebanon, NH;Dartmouth Medical School, Lebanon, NH
Venue:
EvoBIO'10 Proceedings of the 8th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Year:
2010

Citing 11
Cited 0

Understanding the Crucial Role of AttributeInteraction in Data Mining

Artificial Intelligence Review
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
The Design of Innovation: Lessons from and for Competent Genetic Algorithms

The Design of Innovation: Lessons from and for Competent Genetic Algorithms
Multiple Objective Optimization with Vector Evaluated Genetic Algorithms

Proceedings of the 1st International Conference on Genetic Algorithms
Some Guidelines for Genetic Algorithms with Penalty Functions

Proceedings of the 3rd International Conference on Genetic Algorithms
Genetic Algorithms and Evolution Strategies - Similarities and Differences

PPSN I Proceedings of the 1st Workshop on Parallel Problem Solving from Nature
Application Of Genetic Algorithms To The Discovery Of Complex Models For Simulation Studies In Human Genetics

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Evolving combinatorial problem instances that are difficult to solve

Evolutionary Computation
An overview of evolutionary algorithms in multiobjective optimization

Evolutionary Computation
Evolving heuristically difficult instances of combinatorial problems

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Property analysis of symmetric travelling salesman problem instances acquired through evolution

EvoCOP'05 Proceedings of the 5th European conference on Evolutionary Computation in Combinatorial Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

A goal of human genetics is to discover genetic factors that influence individuals’ susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variations and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate six-hundred pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variations have been minimized, while the predictiveness of third, fourth, or fifth order combinations is maximized. This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This could improve our ability to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 56,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.