Bayesian variable selection for the analysis of microarray data with censored outcomes

  • Authors:
  • Naijun Sha;Mahlet G. Tadesse;Marina Vannucci

  • Affiliations:
  • Department of Mathematical Sciences, University of Texas at El Paso El Paso, TX 79968, USA;Department of Biostatistics and Epidemiology, University of Pennsylvania Philadelphia, PA 19104, USA;Department of Statistics, Texas A&M University, College Station TX 77843, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: A common task in microarray data analysis consists of identifying genes associated with a phenotype. When the outcomes of interest are censored time-to-event data, standard approaches assess the effect of genes by fitting univariate survival models. In this paper, we propose a Bayesian variable selection approach, which allows the identification of relevant markers by jointly assessing sets of genes. We consider accelerated failure time (AFT) models with log-normal and log-t distributional assumptions. A data augmentation approach is used to impute the failure times of censored observations and mixture priors are used for the regression coefficients to identify promising subsets of variables. The proposed method provides a unified procedure for the selection of relevant genes and the prediction of survivor functions. Results: We demonstrate the performance of the method on simulated examples and on several microarray datasets. For the simulation study, we consider scenarios with large number of noisy variables and different degrees of correlation between the relevant and non-relevant (noisy) variables. We are able to identify the correct covariates and obtain good prediction of the survivor functions. For the microarray applications, some of our selected genes are known to be related to the diseases under study and a few are in agreement with findings from other researchers. Availability: The Matlab code for implementing the Bayesian variable selection method may be obtained from the corresponding author. Contact: mvannucci@stat.tamu.edu Supplementary Information: Supplementary data are available at Bioinformatics online.