AUC: a statistically consistent and more discriminating measure than accuracy

Authors:
Charles X. Ling;Jin Huang;Harry Zhang
Affiliations:
Department of Computer Science, The University of Western Ontario, London, Ontario, Canada;Department of Computer Science, The University of Western Ontario, London, Ontario, Canada;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
Venue:
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Year:
2003

Citing 7
Cited 43

A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Learning Decision Trees Using the Area Under the ROC Curve

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Toward Bayesian Classifiers with Accurate Probabilities

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Learning to order things

Journal of Artificial Intelligence Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
AUC: a better measure than accuracy in comparing learning algorithms

AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence

Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Model selection via the AUC

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning to predict train wheel failures

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Evaluating the performance of cost-based discretization versus entropy-and error-based discretization

Computers and Operations Research
ROC curves and video analysis optimization in intestinal capsule endoscopy

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Diagnosing scrapie in sheep: A classification experiment

Computers in Biology and Medicine
A critical analysis of variants of the AUC

Machine Learning
Survey of Improving Naive Bayes for Classification

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Proper Model Selection with Significance Test

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Naive Bayes for optimal ranking

Journal of Experimental & Theoretical Artificial Intelligence
Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data

Expert Systems with Applications: An International Journal
Matrix representations, linear transformations, and kernels for disambiguation in natural language

Machine Learning
Detecting Abnormal Events via Hierarchical Dirichlet Processes

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
The ROC isometrics approach to construct reliable classifiers

Intelligent Data Analysis
Cost-Based Sampling of Individual Instances

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Score Fusion by Maximizing the Area under the ROC Curve

IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
Constructing new and better evaluation measures for machine learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Two-stage classifications for improving time-to-failure estimates: a case study in prognostic of train wheels

Applied Intelligence
Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Combining SVM classifiers using genetic fuzzy systems based on AUC for gene expression data analysis

ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Learning locally weighted C4.4 for class probability estimation

DS'07 Proceedings of the 10th international conference on Discovery science
Training multiclass classifiers by maximizing the volume under the ROC surface

EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
On the choice of effectiveness measures for learning to rank

Information Retrieval
Performance metrics for activity recognition

ACM Transactions on Intelligent Systems and Technology (TIST)
Random one-dependence estimators

Pattern Recognition Letters
Data mining for credit card fraud: A comparative study

Decision Support Systems
Learning random forests for ranking

Frontiers of Computer Science in China
Boosting inspired process for improving AUC

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
GA-TVRC: a novel relational time varying classifier to extract temporal information using genetic algorithms

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
A comparison of evaluation metrics for document filtering

CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Processing and analysis of serum antibody binding signals from Printed Glycan Arrays for diagnostic and prognostic applications

International Journal of Bioinformatics Research and Applications
Processing and analysis of serum antibody binding signals from Printed Glycan Arrays for diagnostic and prognostic applications

International Journal of Bioinformatics Research and Applications
Improving the ranking performance of decision trees

ECML'06 Proceedings of the 17th European conference on Machine Learning
Rank measures for ordering

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Preprocessing time series data for classification with application to CRM

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Severe class imbalance: why better algorithms aren't the answer

ECML'05 Proceedings of the 16th European conference on Machine Learning
Training classifiers for unbalanced distribution and cost-sensitive domains with ROC analysis

PKAW'06 Proceedings of the 9th Pacific Rim Knowledge Acquisition international conference on Advances in Knowledge Acquisition and Management
Search engine switching detection based on user personal preferences and behavior patterns

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Acquaintance or partner?: predicting partnership in online and location-based social networks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
A tutorial on human activity recognition using body-worn inertial sensors

ACM Computing Surveys (CSUR)
Area under the distance threshold curve as an evaluation measure for probabilistic classifiers

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms

Computer Methods and Programs in Biomedicine
GA-TVRC-Het: genetic algorithm enhanced time varying relational classifier for evolving heterogeneous networks

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Predictive accuracy has been used as the main and often only evaluation criterion for the predictive performance of classification learning algorithms. In recent years, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been proposed as an alternative single-number measure for evaluating learning algorithms. In this paper, we prove that AUC is a better measure than accuracy. More specifically, we present rigourous definitions on consistency and discriminancy in comparing two evaluation measures for learning algorithms. We then present empirical evaluations and a formal proof to establish that AUC is indeed statistically consistent and more discriminating than accuracy. Our result is quite significant since we formally prove that, for the first time, AUC is a better measure than accuracy in the evaluation of learning algorithms.