Internal validation inferences of significant genomic features in genome-wide screening
Computational Statistics & Data Analysis
Shrinkage and model selection with correlated variables via weighted fusion
Computational Statistics & Data Analysis
IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
An empirical approach to model selection through validation for censored survival data
Journal of Biomedical Informatics
Learning Transformation Models for Ranking and Survival Analysis
The Journal of Machine Learning Research
Support vector methods for survival analysis: a comparison between ranking and regression approaches
Artificial Intelligence in Medicine
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Stabilizing the lasso against cross-validation variability
Computational Statistics & Data Analysis
Hi-index | 3.84 |
Motivation: Survival prediction from gene expression data and other high-dimensional genomic data has been subject to much research during the last years. These kinds of data are associated with the methodological problem of having many more gene expression values than individuals. In addition, the responses are censored survival times. Most of the proposed methods handle this by using Cox's proportional hazards model and obtain parameter estimates by some dimension reduction or parameter shrinkage estimation technique. Using three well-known microarray gene expression data sets, we compare the prediction performance of seven such methods: univariate selection, forward stepwise selection, principal components regression (PCR), supervised principal components regression, partial least squares regression (PLS), ridge regression and the lasso. Results: Statistical learning from subsets should be repeated several times in order to get a fair comparison between methods. Methods using coefficient shrinkage or linear combinations of the gene expression values have much better performance than the simple variable selection methods. For our data sets, ridge regression has the overall best performance. Availability: Matlab and R code for the prediction methods are available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/. Contact: hegembo@math.uio.no