Model selection via the AUC

Authors:
Saharon Rosset
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 2
Cited 11

Tree induction vs. logistic regression: a learning-curve analysis

The Journal of Machine Learning Research
AUC: a statistically consistent and more discriminating measure than accuracy

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

ROC curves and video analysis optimization in intestinal capsule endoscopy

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Exploiting AUC for optimal linear combinations of dichotomizers

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Diagnosing scrapie in sheep: A classification experiment

Computers in Biology and Medicine
Classifier Loss Under Metric Uncertainty

ECML '07 Proceedings of the 18th European conference on Machine Learning
Proper Model Selection with Significance Test

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
An experimental comparison of performance measures for classification

Pattern Recognition Letters
Learning Curves for the Analysis of Multiple Instance Classifiers

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Constructing new and better evaluation measures for machine learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A novel ensemble algorithm for biomedical classification based on Ant Colony Optimization

Applied Soft Computing
Boosting in PN spaces

ECML'06 Proceedings of the 17th European conference on Machine Learning
Subset ranking using regression

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a statistical analysis of the AUC as an evaluation criterion for classification scoring models. First, we consider significance tests for the difference between AUC scores of two algorithms on the same test set. We derive exact moments under simplifying assumptions and use them to examine approximate practical methods from the literature. We then compare AUC to empirical misclassification error when the prediction goal is to minimize future error rate. We show that the AUC may be preferable to empirical error even in this case and discuss the tradeoff between approximation error and estimation error underlying this phenomenon.