ROC confidence bands: an empirical evaluation

Authors:
Sofus A. Macskassy;Foster Provost;Saharon Rosset
Affiliations:
New York University, New York, NY;New York University, New York, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 7
Cited 12

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using asymmetric distributions to improve text classifier probability estimates

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Tree induction vs. logistic regression: a learning-curve analysis

The Journal of Machine Learning Research
ROC confidence bands: an empirical evaluation

ICML '05 Proceedings of the 22nd international conference on Machine learning
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

ROC confidence bands: an empirical evaluation

ICML '05 Proceedings of the 22nd international conference on Machine learning
Pareto optimal linear classification

ICML '06 Proceedings of the 23rd international conference on Machine learning
Cost curves: An improved method for visualizing classifier performance

Machine Learning
Performance Generalization in Biometric Authentication Using Joint User-Specific and Sample Bootstraps

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pointwise exact bootstrap distributions of cost curves

Proceedings of the 25th international conference on Machine learning
Techniques for evaluating fault prediction models

Empirical Software Engineering
A quality-aware optimizer for information extraction

ACM Transactions on Database Systems (TODS)
Nonparametric estimation of the precision-recall curve

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List

The Journal of Machine Learning Research
Evaluating misclassifications in imbalanced data

ECML'06 Proceedings of the 17th European conference on Machine Learning
Curvewise DET confidence regions and pointwise EER confidence intervals using radial sweep methodology

ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics
Assessing classifiers in terms of the partial area under the ROC curve

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is about constructing confidence bands around ROC curves. We first introduce to the machine learning community three band-generating methods from the medical field, and evaluate how well they perform. Such confidence bands represent the region where the "true" ROC curve is expected to reside, with the designated confidence level. To assess the containment of the bands we begin with a synthetic world where we know the true ROC curve---specifically, where the class-conditional model scores are normally distributed. The only method that attains reasonable containment out-of-the-box produces non-parametric, "fixed-width" bands (FWBs). Next we move to a context more appropriate for machine learning evaluations: bands that with a certain confidence level will bound the performance of the model on future data. We introduce a correction to account for the larger uncertainty, and the widened FWBs continue to have reasonable containment. Finally, we assess the bands on 10 relatively large benchmark data sets. We conclude by recommending these FWBs, noting that being non-parametric they are especially attractive for machine learning studies, where the score distributions (1) clearly are not normal, and (2) even for the same data set vary substantially from learning method to learning method.