An empirical approach to model selection through validation for censored survival data

Authors:
Ickwon Choi;Brian J. Wells;Changhong Yu;Michael W. Kattan
Affiliations:
Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA;Department of Quantitative Health Sciences, Cleveland Clinic Foundation, 9500 Euclid Avenue, JJN3, Cleveland, OH 44195, USA;Department of Quantitative Health Sciences, Cleveland Clinic Foundation, 9500 Euclid Avenue, JJN3, Cleveland, OH 44195, USA;Department of Quantitative Health Sciences, Cleveland Clinic Foundation, 9500 Euclid Avenue, JJN3, Cleveland, OH 44195, USA
Venue:
Journal of Biomedical Informatics
Year:
2011

Citing 10
Cited 2

Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Scientific Computing

Scientific Computing
Modeling medical prognosis: survival analysis techniques

Computers and Biomedical Research
An introduction to variable and feature selection

The Journal of Machine Learning Research
Prediction error estimation: a comparison of resampling methods

Bioinformatics
Regression Modeling Strategies

Regression Modeling Strategies
Predicting survival from microarray data—a comparative study

Bioinformatics
A comparative study of survival models for breast cancer prognostication based on microarray data

Bioinformatics

A Hybrid Approach to Survival Model Building Using Integration of Clinical and Molecular Information in Censored Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Medical prognostic models can be designed to predict the future course or outcome of disease progression after diagnosis or treatment. The existing variable selection methods may be precluded by full model advocates when we build a prediction model owing to their estimation bias and selection bias in right-censored time-to-event data. If our objective is to optimize predictive performance by some criterion, we can often achieve a reduced model that has a little bias with low variance, but whose overall performance is enhanced. To accomplish this goal, we propose a new variable selection approach that combines Stepwise Tuning in the Maximum Concordance Index (STMC) with Forward Nested Subset Selection (FNSS) in two stages. In the first stage, the proposed variable selection is employed to identify the best subset of risk factors optimized with the concordance index using inner cross-validation for optimism correction in the outer loop of cross-validation, yielding potentially different final models for each of the folds. We then feed the intermediate results of the prior stage into another selection method in the second stage to resolve the overfitting problem and to select a final model from the variation of predictors in the selected models. Two case studies on relatively different sized survival data sets as well as a simulation study demonstrate that the proposed approach is able to select an improved and reduced average model under a sufficient sample and event size compared with other selection methods such as stepwise selection using the likelihood ratio test, Akaike Information Criterion (AIC), and lasso. Finally, we achieve better final models in each dataset than their full models by most measures. These results of the model selection models and the final models are assessed in a systematic scheme through validation for the independent performance.