A unified view of performance metrics: translating threshold choice into expected classification loss

Authors:
José Hernández-Orallo;Peter Flach;Cèsar Ferri
Affiliations:
Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, València, Spain;Intelligent Systems Laboratory, University of Bristol, Bristol, United Kingdom;Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, València, Spain
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 25
Cited 4

Estimating campaign benefits and modeling lift

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Explicitly representing expected cost: an alternative to ROC representation

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Classification for Imprecise Environments

Machine Learning
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Cranking: Combining Rankings Using Conditional Probability Models on Permutations

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Using Rule Sets to Maximize ROC Performance

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Properties and benefits of calibrated classifiers

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Predicting good probabilities with supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Cost curves: An improved method for visualizing classifier performance

Machine Learning
PAV and the ROC convex hull

Machine Learning
Quantifying counts and costs via classification

Data Mining and Knowledge Discovery
A Simple Lexicographic Ranker and Probability Estimator

ECML '07 Proceedings of the 18th European conference on Machine Learning
An experimental comparison of performance measures for classification

Pattern Recognition Letters
Measuring classifier performance: a coherent alternative to the area under the ROC curve

Machine Learning
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Quantification via Probability Estimators

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Composite Binary Losses

The Journal of Machine Learning Research
Information, Divergence and Risk for Binary Experiments

The Journal of Machine Learning Research
A Survey on Graphical Methods for Classification Predictive Performance Evaluation

IEEE Transactions on Knowledge and Data Engineering
Robust probabilistic calibration

ECML'06 Proceedings of the 17th European conference on Machine Learning
Machine Learning: The Art and Science of Algorithms that Make Sense of Data

Machine Learning: The Art and Science of Algorithms that Make Sense of Data

ROC curves for regression

Pattern Recognition
ROC curves in cost space

Machine Learning
Half-AUC for the evaluation of sensitive or specific classifiers

Pattern Recognition Letters
Aggregative quantification for regression

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many performance metrics have been introduced in the literature for the evaluation of classification performance, each of them with different origins and areas of application. These metrics include accuracy, unweighted accuracy, the area under the ROC curve or the ROC convex hull, the mean absolute error and the Brier score or mean squared error (with its decomposition into refinement and calibration). One way of understanding the relations among these metrics is by means of variable operating conditions (in the form of misclassification costs and/or class distributions). Thus, a metric may correspond to some expected loss over different operating conditions. One dimension for the analysis has been the distribution for this range of operating conditions, leading to some important connections in the area of proper scoring rules. We demonstrate in this paper that there is an equally important dimension which has so far received much less attention in the analysis of performance metrics. This dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among others. By calculating the expected loss obtained with these threshold choice methods for a uniform range of operating conditions we give clear interpretations of the 0-1 loss, the absolute error, the Brier score, the AUC and the refinement loss respectively. Our analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation which can be summarised as follows: given a model, apply the threshold choice methods that correspond with the available information about the operating condition, and compare their expected losses. In order to assist in this procedure we also derive several connections between the aforementioned performance metrics, and we highlight the role of calibration in choosing the threshold choice method.