Uncertainty estimation with a finite dataset in the assessment of classification models

Authors:
Weijie Chen;Waleed A. Yousef;Brandon D. Gallas;Elizabeth R. Hsu;Samir Lababidi;Rong Tang;Gene A. Pennello;W. Fraser Symmans;Lajos Pusztai
Affiliations:
Division of Imaging and Applied Mathematics, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, USA;Computer Science Department, Faculty of Computers and Information, Helwan University, Egypt;Division of Imaging and Applied Mathematics, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, USA;Division of Imaging and Applied Mathematics, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, USA;Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, USA;Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, USA;Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, USA;Departments of Breast Medical Oncology and Pathology, University of Texas M. D. Anderson Cancer Center, Houston, TX, USA;Departments of Breast Medical Oncology and Pathology, University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
Venue:
Computational Statistics & Data Analysis
Year:
2012

Citing 8
Cited 2

No Unbiased Estimator of the Variance of K-Fold Cross-Validation

The Journal of Machine Learning Research
Estimating the uncertainty in the estimated mean area under the ROC curve of a classifier

Pattern Recognition Letters
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap

Computational Statistics & Data Analysis
Small-sample precision of ROC-related estimates

Bioinformatics
Small-sample precision of ROC-related estimates

Bioinformatics
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Editorial: Second Issue for Computational Statistics for Clinical Research

Computational Statistics & Data Analysis
Classification with decision trees from a nonparametric predictive inference perspective

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

To successfully translate genomic classifiers to the clinical practice, it is essential to obtain reliable and reproducible measurement of the classifier performance. A point estimate of the classifier performance has to be accompanied with a measure of its uncertainty. In general, this uncertainty arises from both the finite size of the training set and the finite size of the testing set. The training variability is a measure of classifier stability and is particularly important when the training sample size is small. Methods have been developed for estimating such variability for the performance metric AUC (area under the ROC curve) under two paradigms: a smoothed cross-validation paradigm and an independent validation paradigm. The methodology is demonstrated on three clinical microarray datasets in the microarray quality control consortium phase two project (MAQC-II): breast cancer, multiple myeloma, and neuroblastoma. The results show that the classifier performance is associated with large variability and the estimated performance may change dramatically on different datasets. Moreover, the training variability is found to be of the same order as the testing variability for the datasets and models considered. In conclusion, the feasibility of quantifying both training and testing variability of classifier performance is demonstrated on finite real-world datasets. The large variability of the performance estimates shows that patient sample size is still the bottleneck of the microarray problem and the training variability is not negligible.