Bootstrap estimated true and false positive rates and ROC curve

Authors:
Werner Adler;Berthold Lausen
Affiliations:
Department of Biometry and Epidemiology, University Erlangen-Nuremberg, Waldstr. 6, 91054 Erlangen, Germany;Department of Biometry and Epidemiology, University Erlangen-Nuremberg, Waldstr. 6, 91054 Erlangen, Germany
Venue:
Computational Statistics & Data Analysis
Year:
2009

Citing 7
Cited 3

Bagging predictors

Machine Learning
Random Forests

Machine Learning
Stochastic gradient boosting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Assessment of survival prediction models based on microarray data

Bioinformatics
Generalised indirect classifiers

Computational Statistics & Data Analysis
Bundling classifiers by bagging trees

Computational Statistics & Data Analysis
Bagging tree classifiers for laser scanning images: a data- and simulation-based strategy

Artificial Intelligence in Medicine

Mass description for breast cancer recognition

ICISP'10 Proceedings of the 4th international conference on Image and signal processing
Ensemble classification of paired data

Computational Statistics & Data Analysis
Editorial: Second Issue for Computational Statistics for Clinical Research

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

Diagnostic studies and new biomarkers are assessed by the estimated true and false positive rates of the classification rule. One diagnostic rule is considered for high-dimensional predictor data. Cross-validation and the leave-one-out bootstrap are discussed to estimate true and false positive rates of classifiers by the machine learning methods Adaboost, Bagging, Random Forest, (penalized) logistic regression and support vector machines. The .632+ bootstrap estimation of the misclassification error has been previously proposed to adjust the overfitting of the apparent error. This idea is generalized to the estimation of true and false positive rates. Tree-based simulation models with 8 and 50 binary non-informative variables are analysed to examine the properties of the estimators. Finally, a bootstrap estimation of receiver operating characteristic (ROC) curves is suggested and a .632+ bootstrap estimation of ROC curves is discussed. This approach is applied to high-dimensional gene expression data of leukemia and predictors of image data for glaucoma diagnosis.