A systematic analysis of performance measures for classification tasks

Authors:
Marina Sokolova;Guy Lapalme
Affiliations:
Electronic Health Information Lab, Children's Hospital of Eastern Ontario, Ottawa, Canada;Département d'informatique et de recherche opérationnelle Université de Montréal, Montréal, Canada
Venue:
Information Processing and Management: an International Journal
Year:
2009

Citing 23
Cited 32

OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Elements of machine learning

Elements of machine learning
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
The set covering machine

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Selecting the right objective measure for association analysis

Information Systems - Knowledge discovery and data mining (KDD 2002)
Multi-labelled classification using maximum entropy method

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Beyond accuracy: what data quality means to data consumers

Journal of Management Information Systems
Empirical Studies on Multi-label Classification

ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Multi-topic Aspects in Clinical Text Classification

BIBM '07 Proceedings of the 2007 IEEE International Conference on Bioinformatics and Biomedicine
Get out the vote: determining support or opposition from congressional floor-debate transcripts

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
An effective and robust method for short text classification

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Constructing new and better evaluation measures for machine learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Database-text alignment via structured multilabel classification

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Pulse: mining customer opinions from free text

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
A probabilistic interpretation of precision, recall and F-score, with implication for evaluation

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Learning and evaluation in the presence of class hierarchies: application to text categorization

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Beyond accuracy, f-score and ROC: a family of discriminant measures for performance evaluation

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence

Automated knot detection with visual post-processing of Douglas-fir veneer images

Computers and Electronics in Agriculture
Two information-theoretic tools to assess the performance of multi-class classifiers

Pattern Recognition Letters
A framework for the competitive evaluation of model inference techniques

Proceedings of the First International Workshop on Model Inference In Testing
Beyond pixels and regions: A non-local patch means (NLPM) method for content-level restoration, enhancement, and reconstruction of degraded document images

Pattern Recognition
The practical assessment of test sets with inductive inference techniques

TAIC PART'10 Proceedings of the 5th international academic and industrial conference on Testing - practice and research techniques
Evolutionary q-Gaussian radial basis functions for improving prediction accuracy of gene classification using feature selection

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
A survey of hierarchical classification across different application domains

Data Mining and Knowledge Discovery
Assessment of cardiovascular disease risk prediction models: evaluation methods

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Comparing alternative classifiers for database marketing: The case of imbalanced datasets

Expert Systems with Applications: An International Journal
Parameter estimation of q-Gaussian Radial Basis Functions Neural Networks with a Hybrid Algorithm for binary classification

Neurocomputing
Evaluation of rare event detection

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Two-level classifier ensembles for credit risk assessment

Expert Systems with Applications: An International Journal
reCORE --- a coevolutionary algorithm for rule extraction

SIDE'12 Proceedings of the 2012 international conference on Swarm and Evolutionary Computation
Learning 3D geological structure from drill-rig sensors for automated mining

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Tree ensembles for predicting structured outputs

Pattern Recognition
The impact of conceptualization on text classification

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Assessment of financial risk prediction models with multi-criteria decision making methods

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Automated Comparison of State-Based Software Models in Terms of Their Language and Structure

ACM Transactions on Software Engineering and Methodology (TOSEM)
A generalized cluster centroid based classifier for text categorization

Information Processing and Management: an International Journal
Time series classification for the prediction of dialysis in critically ill patients using echo statenetworks

Engineering Applications of Artificial Intelligence
Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans

Fuzzy Sets and Systems
Efficacious end user measures part 1: relative class size and end user problem domains

Advances in Artificial Intelligence - Special issue on Artificial Intelligence Applications in Biomedicine
Opportunistic synergy: a classifier fusion engine for micro-gesture recognition

Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications
Pacc - a discriminative and accuracy correlated measure for assessment of classification results

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Cloud based intelligent system for delivering health care as a service

Computer Methods and Programs in Biomedicine
Machine learning-based classifiers ensemble for credit risk assessment

International Journal of Electronic Finance
Randomized pilot study and qualitative evaluation of a clinical decision support system for brain tumour diagnosis based on SV 1H MRS: Evaluation as an additional information procedure for novice radiologists

Computers in Biology and Medicine
Optimizing parameters of support vector machine using fast messy genetic algorithm for dispute classification

Expert Systems with Applications: An International Journal
Classification of major construction materials in construction environments using ensemble classifiers

Advanced Engineering Informatics
A similarity-based approach for data stream classification

Expert Systems with Applications: An International Journal
Comprehensible classification models: a position paper

ACM SIGKDD Explorations Newsletter
Latent-Dynamic Conditional Random Fields for recognizing activities in smart homes

Journal of Ambient Intelligence and Smart Environments - Ambient and Smart Component Technologies for Human Centric Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class, multi-labelled, and hierarchical. For each classification task, the study relates a set of changes in a confusion matrix to specific characteristics of data. Then the analysis concentrates on the type of changes to a confusion matrix that do not change a measure, therefore, preserve a classifier's evaluation (measure invariance). The result is the measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem. This formal analysis is supported by examples of applications where invariance properties of measures lead to a more reliable evaluation of classifiers. Text classification supplements the discussion with several case studies.