Estimating campaign benefits and modeling lift
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Explicitly representing expected cost: an alternative to ROC representation
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Classification for Imprecise Environments
Machine Learning
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Cranking: Combining Rankings Using Conditional Probability Models on Permutations
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Using Rule Sets to Maximize ROC Performance
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Transforming classifier scores into accurate multiclass probability estimates
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Properties and benefits of calibrated classifiers
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Predicting good probabilities with supervised learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Machine Learning
Quantifying counts and costs via classification
Data Mining and Knowledge Discovery
A Simple Lexicographic Ranker and Probability Estimator
ECML '07 Proceedings of the 18th European conference on Machine Learning
An experimental comparison of performance measures for classification
Pattern Recognition Letters
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Quantification via Probability Estimators
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
The Journal of Machine Learning Research
Information, Divergence and Risk for Binary Experiments
The Journal of Machine Learning Research
A Survey on Graphical Methods for Classification Predictive Performance Evaluation
IEEE Transactions on Knowledge and Data Engineering
Robust probabilistic calibration
ECML'06 Proceedings of the 17th European conference on Machine Learning
Machine Learning: The Art and Science of Algorithms that Make Sense of Data
Machine Learning: The Art and Science of Algorithms that Make Sense of Data
Pattern Recognition
Machine Learning
Half-AUC for the evaluation of sensitive or specific classifiers
Pattern Recognition Letters
Aggregative quantification for regression
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Many performance metrics have been introduced in the literature for the evaluation of classification performance, each of them with different origins and areas of application. These metrics include accuracy, unweighted accuracy, the area under the ROC curve or the ROC convex hull, the mean absolute error and the Brier score or mean squared error (with its decomposition into refinement and calibration). One way of understanding the relations among these metrics is by means of variable operating conditions (in the form of misclassification costs and/or class distributions). Thus, a metric may correspond to some expected loss over different operating conditions. One dimension for the analysis has been the distribution for this range of operating conditions, leading to some important connections in the area of proper scoring rules. We demonstrate in this paper that there is an equally important dimension which has so far received much less attention in the analysis of performance metrics. This dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among others. By calculating the expected loss obtained with these threshold choice methods for a uniform range of operating conditions we give clear interpretations of the 0-1 loss, the absolute error, the Brier score, the AUC and the refinement loss respectively. Our analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation which can be summarised as follows: given a model, apply the threshold choice methods that correspond with the available information about the operating condition, and compare their expected losses. In order to assist in this procedure we also derive several connections between the aforementioned performance metrics, and we highlight the role of calibration in choosing the threshold choice method.