A flexible approximate likelihood ratio test for detecting differential expression in microarray data

  • Authors:
  • Ahmed Hossain;Joseph Beyene;Andrew R. Willan;Pingzhao Hu

  • Affiliations:
  • Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, ON M5T 3M7, Canada and Biostatistics Methodology Unit, Program in Child Health Evaluative Sciences, SickKids ...;Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, ON M5T 3M7, Canada and Biostatistics Methodology Unit, Program in Child Health Evaluative Sciences, SickKids ...;Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, ON M5T 3M7, Canada and Biostatistics Methodology Unit, Program in Child Health Evaluative Sciences, SickKids ...;Program in Genetics & Genome Biology, SickKids Research Institute, 555 University Avenue, Toronto, ON M5G 1X8, Canada

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2009

Quantified Score

Hi-index 0.03

Visualization

Abstract

Identifying differentially expressed genes in microarray data has been studied extensively and several methods have been proposed. Most popular methods in the study of gene expression microarray data analysis rely on normal distribution assumption and are based on a Wald statistic. These methods may be inefficient when expression levels follow a skewed distribution. To deal with possible violations of the normality assumption, we propose a method based on Generalized Logistic Distribution of Type II (GLDII). The motivation behind this distributional assumption is to allow longer tails than normal distribution. This is important in analyzing gene expression data since extreme values are common in such experiments. The shape parameter for GLDII allows flexibility in modeling a wide range of distributions. To simplify the computational complexity involved in carrying out Likelihood Ratio (LR) tests for several thousands of genes, an Approximate LR Test (ALRT) is proposed. We also generalize the two-class ALRT method to multi-class microarray data. The performance of the ALRT method under the GLDII assumption is compared to methods based on Wald-type statistics using simulation. The results from the simulations show that our method performs quite well compared to the significance analysis of microarrays (SAM) approach using standardized Wilcoxon rank statistics and the empirical Bayes (E-B) t-statistics. Our method is also less sensitive to extreme values. We illustrate our method using two publicly available gene expression data sets.