Aggregating performance metrics for classifier evaluation

Authors:
Naeem Seliya;Taghi M. Khoshgoftaar;Jason Van Hulse
Affiliations:
Computer and Information Science, University of Michigan - Dearborn, Dearborn, MI;Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL;Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL
Venue:
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Year:
2009

Citing 15
Cited 1

Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
The 1999 DARPA off-line intrusion detection evaluation

Computer Networks: The International Journal of Computer and Telecommunications Networking - Special issue on recent advances in intrusion detection systems
Measuring Dynamic Program Complexity

IEEE Software
Experience from Replicating Empirical Studies on Prediction Models

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Empirical Software Engineering
Data mining in metric space: an empirical analysis of supervised learning performance criteria

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
The relationship between Precision-Recall and ROC curves

ICML '06 Proceedings of the 23rd international conference on Machine learning
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Adequate and Precise Evaluation of Quality Models in Software Engineering Studies

ICSEW '07 Proceedings of the 29th International Conference on Software Engineering Workshops
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
An Empirical Study of Learning from Imbalanced Data Using Random Forest

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Visualizing Classifier Performance on Different Domains

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Beyond accuracy, f-score and ROC: a family of discriminant measures for performance evaluation

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence

Multivariate statistical tests for comparing classification algorithms

LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are several performance metrics that have been proposed for evaluating a classification model, e.g., accuracy, error rates, precision, recall, etc. While it is known that evaluating a classifier on only one performance metric is not advisable, the use of multiple performance metrics poses unique comparative challenges for the analyst. Since different performance metrics provide different perspectives into the classifier performance space, it is common for a learner to be relatively better on one performance metric and not better on another performance metric. We present a novel approach to aggregating several individual performance metrics into one metric, called the Relative Performance Metric (RPM). A large case study consisting of 35 real-world classification datasets, 12 classification algorithms, and 10 commonly used performance metrics illustrates the practical appeal of RPM. The empirical results clearly demonstrate the benefits of using RPM when classifier evaluation requires the consideration of a large number of individual performance metrics.