An empirical approach to model selection through validation for censored survival data

  • Authors:
  • Ickwon Choi;Brian J. Wells;Changhong Yu;Michael W. Kattan

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA;Department of Quantitative Health Sciences, Cleveland Clinic Foundation, 9500 Euclid Avenue, JJN3, Cleveland, OH 44195, USA;Department of Quantitative Health Sciences, Cleveland Clinic Foundation, 9500 Euclid Avenue, JJN3, Cleveland, OH 44195, USA;Department of Quantitative Health Sciences, Cleveland Clinic Foundation, 9500 Euclid Avenue, JJN3, Cleveland, OH 44195, USA

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Medical prognostic models can be designed to predict the future course or outcome of disease progression after diagnosis or treatment. The existing variable selection methods may be precluded by full model advocates when we build a prediction model owing to their estimation bias and selection bias in right-censored time-to-event data. If our objective is to optimize predictive performance by some criterion, we can often achieve a reduced model that has a little bias with low variance, but whose overall performance is enhanced. To accomplish this goal, we propose a new variable selection approach that combines Stepwise Tuning in the Maximum Concordance Index (STMC) with Forward Nested Subset Selection (FNSS) in two stages. In the first stage, the proposed variable selection is employed to identify the best subset of risk factors optimized with the concordance index using inner cross-validation for optimism correction in the outer loop of cross-validation, yielding potentially different final models for each of the folds. We then feed the intermediate results of the prior stage into another selection method in the second stage to resolve the overfitting problem and to select a final model from the variation of predictors in the selected models. Two case studies on relatively different sized survival data sets as well as a simulation study demonstrate that the proposed approach is able to select an improved and reduced average model under a sufficient sample and event size compared with other selection methods such as stepwise selection using the likelihood ratio test, Akaike Information Criterion (AIC), and lasso. Finally, we achieve better final models in each dataset than their full models by most measures. These results of the model selection models and the final models are assessed in a systematic scheme through validation for the independent performance.