An experimental comparison of performance measures for classification

Authors:
C. Ferri;J. Hernández-Orallo;R. Modroiu
Affiliations:
Departament de Sistemes Informítics i Computació, Universitat Politècnica de València, València 46022, Spain;Departament de Sistemes Informítics i Computació, Universitat Politècnica de València, València 46022, Spain;Departament de Sistemes Informítics i Computació, Universitat Politècnica de València, València 46022, Spain
Venue:
Pattern Recognition Letters
Year:
2009

Citing 21
Cited 30

A Comparative Analysis of Methods for Pruning Decision Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Meta Analysis of Classification Algorithms for Pattern Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Cranking: Combining Rankings Using Conditional Probability Models on Permutations

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Using Rule Sets to Maximize ROC Performance

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Tree Induction for Probability-Based Ranking

Machine Learning
Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Data mining in metric space: an empirical analysis of supervised learning performance criteria

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Model selection via the AUC

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
ROC `n' Rule Learning—Towards a Better Understanding of Covering Algorithms

Machine Learning
Theoretical Bounds of Majority Voting Performance for a Binary Classification Problem

IEEE Transactions on Pattern Analysis and Machine Intelligence
The relationship between Precision-Recall and ROC curves

ICML '06 Proceedings of the 23rd international conference on Machine learning
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
PAV and the ROC convex hull

Machine Learning
Constructing new and better evaluation measures for machine learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Mixed group ranks: preference and confidence in classifier combination

IEEE Transactions on Pattern Analysis and Machine Intelligence

Evaluation Methods for Ordinal Classification

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Similarity-binning averaging: a generalisation of binning calibration

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Selecting few genes for microarray gene expression classification

CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

IEEE Transactions on Evolutionary Computation
An empirical study of classification algorithm evaluation for financial risk prediction

Applied Soft Computing
An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes

Pattern Recognition
Comparing alternative classifiers for database marketing: The case of imbalanced datasets

Expert Systems with Applications: An International Journal
RETRACTED: Application of Bayes linear discriminant functions in image classification

Pattern Recognition Letters
Remote sensing image segmentation by active queries

Pattern Recognition
Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods

Expert Systems with Applications: An International Journal
User preferences based software defect detection algorithms selection using MCDM

Information Sciences: an International Journal
An optimization methodology for machine learning strategies and regression problems in ballistic impact scenarios

Applied Intelligence
Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

Knowledge-Based Systems
Analysis and design of rank-based classifiers

Expert Systems with Applications: An International Journal
Partial Least Square Discriminant Analysis for bankruptcy prediction

Decision Support Systems
Efficacious end user measures part 1: relative class size and end user problem domains

Advances in Artificial Intelligence - Special issue on Artificial Intelligence Applications in Biomedicine
Predictive model performance: offline and online evaluations

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A unified view of performance metrics: translating threshold choice into expected classification loss

The Journal of Machine Learning Research
On the effect of calibration in classifier combination

Applied Intelligence
ROC curves for regression

Pattern Recognition
Texture classification using kernel-based techniques

IWANN'13 Proceedings of the 12th international conference on Artificial Neural Networks: advances in computational intelligence - Volume Part I
MEFES: An evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology

Knowledge-Based Systems
Optimizing parameters of support vector machine using fast messy genetic algorithm for dispute classification

Expert Systems with Applications: An International Journal
Classification of major construction materials in construction environments using ensemble classifiers

Advanced Engineering Informatics
Empowering difficult classes with a similarity-based aggregation in multi-class classification problems

Information Sciences: an International Journal
Aggregative quantification for regression

Data Mining and Knowledge Discovery
Just-in-time adaptive similarity component analysis in nonstationary environments

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
ROC analysis of classifiers in machine learning: A survey

Intelligent Data Analysis
User-driven feature space transformation

EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization

Quantified Score

Hi-index	0.11

Visualization

Abstract

Performance metrics in classification are fundamental in assessing the quality of learning methods and learned models. However, many different measures have been defined in the literature with the aim of making better choices in general or for a specific application area. Choices made by one metric are claimed to be different from choices made by other metrics. In this work, we analyse experimentally the behaviour of 18 different performance metrics in several scenarios, identifying clusters and relationships between measures. We also perform a sensitivity analysis for all of them in terms of several traits: class threshold choice, separability/ranking quality, calibration performance and sensitivity to changes in prior class distribution. From the definitions and experiments, we make a comprehensive analysis of the relationships between metrics, and a taxonomy and arrangement of them according to the previous traits. This can be useful for choosing the most adequate measure (or set of measures) for a specific application. Additionally, the study also highlights some niches in which new measures might be defined and also shows that some supposedly innovative measures make the same choices (or almost) as existing ones. Finally, this work can also be used as a reference for comparing experimental results in pattern recognition and machine learning literature, when using different measures.