Comparison of methods for meta-dimensional data analysis using in silico and biological data sets

Authors:
Emily R. Holzinger;Scott M. Dudek;Alex T. Frase;Brooke Fridley;Prabhakar Chalise;Marylyn D. Ritchie
Affiliations:
Center for Human Genetics Research, Vanderbilt University, Nashville, TN;Center for Human Genetics Research, Vanderbilt University, Nashville, TN;Center for Systems Genomics, Pennsylvania State University, University Park, PA;Divison of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN;Divison of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN;Center for Systems Genomics, Pennsylvania State University, University Park, PA
Venue:
EvoBIO'12 Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Year:
2012

Citing 5
Cited 0

Random Forests

Machine Learning
GenABEL

Bioinformatics
On safari to Random Jungle

Bioinformatics
ATHENA optimization: the effect of initial parameter settings across different genetic models

EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Comparison of penalty functions for sparse canonical correlation analysis

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent technological innovations have catalyzed the generation of a massive amount of data at various levels of biological regulation, including DNA, RNA and protein. Due to the complex nature of biology, the underlying model may only be discovered by integrating different types of high-throughput data to perform a "meta-dimensional" analysis. For this study, we used simulated gene expression and genotype data to compare three methods that show potential for integrating different types of data in order to generate models that predict a given phenotype: the Analysis Tool for Heritable and Environmental Network Associations (ATHENA), Random Jungle (RJ), and Lasso. Based on our results, we applied RJ and ATHENA sequentially to a biological data set that consisted of genome-wide genotypes and gene expression levels from lymphoblastoid cell lines (LCLs) to predict cytotoxicity. The best model consisted of two SNPs and two gene expression variables with an r-squared value of 0.32.