AUC: a better measure than accuracy in comparing learning algorithms

Authors:
Charles X. Ling;Jin Huang;Harry Zhang
Affiliations:
Department of Computer Science, The University of Western Ontario, London, Ontario, Canada;Department of Computer Science, The University of Western Ontario, London, Ontario, Canada;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
Venue:
AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
Year:
2003

Citing 7
Cited 26

C4.5: programs for machine learning

C4.5: programs for machine learning
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Toward Bayesian Classifiers with Accurate Probabilities

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Tree Induction for Probability-Based Ranking

Machine Learning
Learning to order things

Journal of Artificial Intelligence Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Impact Studies and Sensitivity Analysis in Medical Data Mining with ROC-based Genetic Learning

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Evolutionary approaches to fuzzy modelling for classification

The Knowledge Engineering Review
Predicting Metastasis in Breast Cancer: Comparing a Decision Tree with Domain Experts

Journal of Medical Systems
A multi-expert approach for wavelet-based face detection

Pattern Recognition Letters
Local binary patterns for a hybrid fingerprint matcher

Pattern Recognition
PRIE: a system for generating rulelists to maximize ROC performance

Data Mining and Knowledge Discovery
A Fault Prediction Model with Limited Fault Data to Improve Test Process

PROFES '08 Proceedings of the 9th international conference on Product-Focused Software Process Improvement
Learning Curves for the Analysis of Multiple Instance Classifiers

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Descriptors for image-based fingerprint matchers

Expert Systems with Applications: An International Journal
Aligning Bayesian Network Classifiers with Medical Contexts

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
AUC: a statistically consistent and more discriminating measure than accuracy

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Combining local, regional and global matchers for a template protected on-line signature verification system

Expert Systems with Applications: An International Journal
AUC maximization linear classifier based on active learning and its application

Neurocomputing
Predicting trait impressions of faces using local face recognition techniques

Expert Systems with Applications: An International Journal
Fusion of fuzzy statistical distributions for classification of thyroid ultrasound patterns

Artificial Intelligence in Medicine
Obtaining optimal class distribution for decision trees: comparative analysis of CTC and C4.5

CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
Smooth receiver operating characteristics (smROC) curves

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
C4.5 consolidation process: an alternative to intelligent oversampling methods in class imbalance problems

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Predict on-shelf product availability in grocery retailing with classification methods

Expert Systems with Applications: An International Journal
Evaluating misclassifications in imbalanced data

ECML'06 Proceedings of the 17th European conference on Machine Learning
Training classifiers for unbalanced distribution and cost-sensitive domains with ROC analysis

PKAW'06 Proceedings of the 9th Pacific Rim Knowledge Acquisition international conference on Advances in Knowledge Acquisition and Management
An empirical evaluation of bagging with different algorithms on imbalanced data

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
A GA-Based wrapper feature selection for animal breeding data mining

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Evaluation of biometric systems: a study of users' acceptance and satisfaction

International Journal of Biometrics
Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols

User Modeling and User-Adapted Interaction

Quantified Score

Hi-index	0.01

Visualization

Abstract

Predictive accuracy has been widely used as the main criterion for comparing the predictive ability of classification systems (such as C4.5, neural networks, and Naive Bayes). Most of these classifiers also produce probability estimations of the classification, but they are completely ignored in the accuracy measure. This is often taken for granted because both training and testing sets only provide class labels. In this paper we establish rigourously that, even in this setting, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, provides a better measure than accuracy. Our result is quite significant for three reasons. First, we establish, for the first time, rigourous criteria for comparing evaluation measures for learning algorithms. Second, it suggests that AUC should replace accuracy when measuring and comparing classification systems. Third, our result also prompts us to reevaluate many well-established conclusions based on accuracy in machine learning. For example, it is well accepted in the machine learning community that, in terms of predictive accuracy, Naive Bayes and decision trees are very similar. Using AUC, however, we show experimentally that Naive Bayes is significantly better than the decision-tree learning algorithms.