Small-sample precision of ROC-related estimates

Authors:
Blaise Hanczar;Jianping Hua;Chao Sima;John Weinstein;Michael Bittner;Edward R. Dougherty
Affiliations:
-;-;-;-;-;-
Venue:
Bioinformatics
Year:
2010

Citing 0
Cited 12

Transcriptional networks characterize ventricular dysfunction after myocardial infarction: A proof-of-concept investigation

Journal of Biomedical Informatics
An experimental comparison of cross-validation techniques for estimating the area under the ROC curve

Computational Statistics & Data Analysis
Research Article: Predicting protein-protein interactions using graph invariants and a neural network

Computational Biology and Chemistry
Uncertainty estimation with a finite dataset in the assessment of classification models

Computational Statistics & Data Analysis
Classifier variability: Accounting for training and testing

Pattern Recognition
Multi-stage modeling using fuzzy multi-criteria feature selection to improve survival prediction of ICU septic shock patients

Expert Systems with Applications: An International Journal
A New Measure of Classifier Performance for Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Substantial improvements in the set-covering projection classifier CHIRP (composite hypercubes on iterated random projections)

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
The reliability of estimated confidence intervals for classification error rates when only a single sample is available

Pattern Recognition
Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Classifier Ensemble Methods for Diagnosing COPD from Volatile Organic Compounds in Exhaled Air

International Journal of Knowledge Discovery in Bioinformatics
Predicting the risk of squamous dysplasia and esophageal squamous cell carcinoma using minimum classification error method

Computers in Biology and Medicine

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics? Results: Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results. Availability: Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html Contact: edward@mail.ece.tamu.edu