Distribution modeling and simulation of gene expression data

Authors:
Rudolph S. Parrish;Horace J. Spencer, III;Ping Xu
Affiliations:
Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, 555 S. Floyd St, Suite 4026, Louisville, KY 40292, USA;Department of Biostatistics, College of Public Health, University of Arkansas for Medical Sciences, West Markham St., Little Rock, AR, USA;Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, 555 S. Floyd St, Suite 4026, Louisville, KY 40292, USA
Venue:
Computational Statistics & Data Analysis
Year:
2009

Citing 8
Cited 3

Multivariate statistical simulation

Multivariate statistical simulation
A new modified Cholesky factorization

SIAM Journal on Scientific and Statistical Computing
Algorithm 695: software for a new modified Cholesky factorization

ACM Transactions on Mathematical Software (TOMS)
Mathematical Statistics with Mathematica with CD-ROM

Mathematical Statistics with Mathematica with CD-ROM
A mixture model approach for the analysis of microarray gene expression data

Computational Statistics & Data Analysis
A Revised Modified Cholesky Factorization Algorithm

SIAM Journal on Optimization
Prediction error estimation: a comparison of resampling methods

Bioinformatics
Modified linear discriminant analysis approaches for classification of high-dimensional microarray data

Computational Statistics & Data Analysis

Editorial: Statistical genetics & statistical genomics: Where biology, epistemology, statistics, and computation collide

Computational Statistics & Data Analysis
Modified linear discriminant analysis approaches for classification of high-dimensional microarray data

Computational Statistics & Data Analysis
Empirical evaluation of consistency and accuracy of methods to detect differentially expressed genes based on microarray data

Computers in Biology and Medicine

Quantified Score

Hi-index	0.03

Visualization

Abstract

Data derived from gene expression microarrays often are used for purposes of classification and discovery. Many methods have been proposed for accomplishing these and related aims, however the statistical properties of such methods generally are not well established. To this end, it is desirable to develop realistic mathematical and statistical models that can be used in a simulation context so that the impacts of data analysis methods and testing approaches can be established. A method is developed in which variation among arrays can be characterized simultaneously for a large number of genes resulting in a multivariate model of gene expression. The method is based on selecting mathematical transformations of the underlying expression measures such that the transformed variables follow approximately a Gaussian distribution, and then estimating associated parameters, including correlations. The result is a multivariate normal distribution that serves to model transformed gene expression values within a subject population, while accounting for covariances among genes and/or probes. This model then is used to simulate microarray expression and probe intensity data by employing a modified Cholesky matrix factorization technique which addresses the singularity problem for the ''small n, big p'' situation. An example is given using prostate cancer data and, as an illustration, it is shown how data normalization can be investigated using this approach.