Bayesian hierarchical error model for analysis of gene expression data

  • Authors:
  • Hyungjun Cho;Jae K. Lee

  • Affiliations:
  • Division of Biostatistics and Epidemiology, Department of Health Evaluation Sciences, University of Virginia School of Medicine, Hospital West Complex, Room 3181, P.O. Box 800717, Charlottesville, ...;Division of Biostatistics and Epidemiology, Department of Health Evaluation Sciences, University of Virginia School of Medicine, Hospital West Complex, Room 3181, P.O. Box 800717, Charlottesville, ...

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Analysis of genome-wide microarray data requires the estimation of a large number of genetic parameters for individual genes and their interaction expression patterns under multiple biological conditions. The sources of microarray error variability comprises various biological and experimental factors, such as biological and individual replication, sample preparation, hybridization and image processing. Moreover, the same gene often shows quite heterogeneous error variability under different biological and experimental conditions, which must be estimated separately for evaluating the statistical significance of differential expression patterns. Widely used linear modeling approaches are limited because they do not allow simultaneous modeling and inference on the large number of these genetic parameters and heterogeneous error components on different genes, different biological and experimental conditions, and varying intensity ranges in microarray data. Results: We propose a Bayesian hierarchical error model (HEM) to overcome the above restrictions. HEM accounts for heterogeneous error variability in an oligonucleotide microarray experiment. The error variability is decomposed into two components (experimental and biological errors) when both biological and experimental replicates are available. Our HEM inference is based on Markov chain Monte Carlo to estimate a large number of parameters from a single-likelihood function for all genes. An F-like summary statistic is proposed to identify differentially expressed genes under multiple conditions based on the HEM estimation. The performance of HEM and its F-like statistic was examined with simulated data and two published microarray datasets---primate brain data and mouse B-cell development data. HEM was also compared with ANOVA using simulated data. Availability: The software for the HEM is available from the authors upon request.