A Comparative Analysis of Methods for Pruning Decision Trees
IEEE Transactions on Pattern Analysis and Machine Intelligence
Meta Analysis of Classification Algorithms for Pattern Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Modern Information Retrieval
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Cranking: Combining Rankings Using Conditional Probability Models on Permutations
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Using Rule Sets to Maximize ROC Performance
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Tree Induction for Probability-Based Ranking
Machine Learning
Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
Combining Pattern Classifiers: Methods and Algorithms
Combining Pattern Classifiers: Methods and Algorithms
Data mining in metric space: an empirical analysis of supervised learning performance criteria
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Using AUC and Accuracy in Evaluating Learning Algorithms
IEEE Transactions on Knowledge and Data Engineering
Theoretical Bounds of Majority Voting Performance for a Binary Classification Problem
IEEE Transactions on Pattern Analysis and Machine Intelligence
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Machine Learning
Constructing new and better evaluation measures for machine learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Mixed group ranks: preference and confidence in classifier combination
IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation Methods for Ordinal Classification
Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Similarity-binning averaging: a generalisation of binning calibration
IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Selecting few genes for microarray gene expression classification
CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
IEEE Transactions on Evolutionary Computation
An empirical study of classification algorithm evaluation for financial risk prediction
Applied Soft Computing
Comparing alternative classifiers for database marketing: The case of imbalanced datasets
Expert Systems with Applications: An International Journal
RETRACTED: Application of Bayes linear discriminant functions in image classification
Pattern Recognition Letters
Remote sensing image segmentation by active queries
Pattern Recognition
Expert Systems with Applications: An International Journal
User preferences based software defect detection algorithms selection using MCDM
Information Sciences: an International Journal
Analysis and design of rank-based classifiers
Expert Systems with Applications: An International Journal
Partial Least Square Discriminant Analysis for bankruptcy prediction
Decision Support Systems
Efficacious end user measures part 1: relative class size and end user problem domains
Advances in Artificial Intelligence - Special issue on Artificial Intelligence Applications in Biomedicine
Predictive model performance: offline and online evaluations
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
The Journal of Machine Learning Research
On the effect of calibration in classifier combination
Applied Intelligence
Pattern Recognition
Texture classification using kernel-based techniques
IWANN'13 Proceedings of the 12th international conference on Artificial Neural Networks: advances in computational intelligence - Volume Part I
Expert Systems with Applications: An International Journal
Advanced Engineering Informatics
Information Sciences: an International Journal
Aggregative quantification for regression
Data Mining and Knowledge Discovery
Just-in-time adaptive similarity component analysis in nonstationary environments
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
ROC analysis of classifiers in machine learning: A survey
Intelligent Data Analysis
User-driven feature space transformation
EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization
Hi-index | 0.11 |
Performance metrics in classification are fundamental in assessing the quality of learning methods and learned models. However, many different measures have been defined in the literature with the aim of making better choices in general or for a specific application area. Choices made by one metric are claimed to be different from choices made by other metrics. In this work, we analyse experimentally the behaviour of 18 different performance metrics in several scenarios, identifying clusters and relationships between measures. We also perform a sensitivity analysis for all of them in terms of several traits: class threshold choice, separability/ranking quality, calibration performance and sensitivity to changes in prior class distribution. From the definitions and experiments, we make a comprehensive analysis of the relationships between metrics, and a taxonomy and arrangement of them according to the previous traits. This can be useful for choosing the most adequate measure (or set of measures) for a specific application. Additionally, the study also highlights some niches in which new measures might be defined and also shows that some supposedly innovative measures make the same choices (or almost) as existing ones. Finally, this work can also be used as a reference for comparing experimental results in pattern recognition and machine learning literature, when using different measures.