Double-blind evaluation and benchmarking of survival models in a multi-centre study

  • Authors:
  • A. Taktak;L. Antolini;M. Aung;P. Boracchi;I. Campbell;B. Damato;E. Ifeachor;N. Lama;P. Lisboa;C. Setzkorn;V. Stalbovskaya;E. Biganzoli

  • Affiliations:
  • Department of Clinical Engineering, Royal Liverpool University Hospital, 1st Floor, Duncan Building, Daulby Street, Liverpool, UK;Unití di Statistica Medica e Biometria, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy;School of Mathematics and Computing Sciences, Liverpool John Moores University, Liverpool, UK;Istituto di Statistica Medica e Biometria, Universití degli Studi di Milano, Milan, Italy;IC Statistical Services, Wirral, UK;St. Paul's Eye Unit, Royal Liverpool University Hospital, Liverpool, UK;School of Communications and Electronics, University of Plymouth, Plymouth, UK;Istituto di Statistica Medica e Biometria, Universití degli Studi di Milano, Milan, Italy;School of Mathematics and Computing Sciences, Liverpool John Moores University, Liverpool, UK;Department of Clinical Engineering, Royal Liverpool University Hospital, 1st Floor, Duncan Building, Daulby Street, Liverpool, UK;School of Communications and Electronics, University of Plymouth, Plymouth, UK;Unití di Statistica Medica e Biometria, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Accurate modelling of time-to-event data is of particular importance for both exploratory and predictive analysis in cancer, and can have a direct impact on clinical care. This study presents a detailed double-blind evaluation of the accuracy in out-of-sample prediction of mortality from two generic non-linear models, using artificial neural networks benchmarked against a partial logistic spline, log-normal and COX regression models. A data set containing 2880 samples was shared over the Internet using a purpose-built secure environment called GEOCONDA (www.geoconda.com). The evaluation was carried out in three parts. The first was a comparison between the predicted survival estimates for each of the four survival groups defined by the TNM staging system, against the empirical estimates derived by the Kaplan-Meier method. The second approach focused on the accurate prediction of survival over time, quantified with the time dependent C index (C^t^d). Finally, calibration plots were obtained over the range of follow-up and tested using a generalization of the Hosmer-Lemeshow test. All models showed satisfactory performance, with values of C^t^d of about 0.7. None of the models showed a systematic tendency towards over/under estimation of the observed survival at @t=3 and 5 years. At @t=10 years, all models underestimated the observed survival, except for COX regression which returned an overestimate. The study presents a robust and unbiased benchmarking methodology using a bespoke web facility. It was concluded that powerful, recent flexible modelling algorithms show a comparative predictive performance to that of more established methods from the medical and biological literature, for the reference data set.