Individualized predictions of survival time distributions from gene expression data using a Bayesian MCMC approach

  • Authors:
  • Lars Kaderali

  • Affiliations:
  • German Cancer Research Center, Theoretical Bioinformatics, Heidelberg, Germany

  • Venue:
  • BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

It has previously been demonstrated that gene expression data correlate with event-free and overall survival in several cancers. A number of methods exist that assign patients to different risk classes based on expression profiles of their tumor. However, predictions of actual survival times in years for the individual patient, together with confidence intervals on the predictions made, would provide a far more detailed view, and could aid the clinician considerably in evaluating different treatment options. Similarly, a method able to make such predictions could be analyzed to infer knowledge about the relevant disease genes, hinting at potential disease pathways and pointing to relevant targets for drug design. Here too, confidences on the relevance values for the individual genes would be useful to have. Our algorithm to tackle these questions builds on a hierarchical Bayesian approach, combining a Cox regression model with a hierarchical prior distribution on the regression parameters for feature selection. This prior enables the method to efficiently deal with the low sample number, high dimensionality setting characteristic of microarray datasets. We then sample from the posterior distribution over a patients survival time, given gene expression measurements and training data. This enables us to make statements such as "with probability 0.6, the patient will survive between 3 and 4 years". A similar approach is used to compute relevance values with confidence intervals for the individual genes measured. The method is evaluated on a simulated dataset, showing feasibility of the approach. We then apply the algorithm to a publicly available dataset on diffuse large B-cell lymphoma, a cancer of the lymphocytes, and demonstrate that it successfully predicts survival times and survival time distributions for the individual patient.