Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

Authors:
Alberto FernáNdez;Victoria LóPez;Mikel Galar;MaríA José Del Jesus;Francisco Herrera
Affiliations:
Department of Computer Science, University of Jaén, Jaén, Spain;Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Granada, Spain;Department of Automatic and Computation, Public University of Navarra, Spain;Department of Computer Science, University of Jaén, Jaén, Spain;Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Granada, Spain
Venue:
Knowledge-Based Systems
Year:
2013

Citing 48
Cited 1

Support-Vector Networks

Machine Learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Learning and making decisions when costs and probabilities are both unknown

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An Instance-Weighting Method to Induce Cost-Sensitive Trees

IEEE Transactions on Knowledge and Data Engineering
Reducing multiclass to binary: a unifying approach for margin classifiers

The Journal of Machine Learning Research
Round robin classification

The Journal of Machine Learning Research
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
In Defense of One-Vs-All Classification

The Journal of Machine Learning Research
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Biostatistical Analysis (5th Edition)

Biostatistical Analysis (5th Edition)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
The class imbalance problem: A systematic study

Intelligent Data Analysis
Intrusion detection in computer networks by a modular ensemble of one-class classifiers

Information Fusion
Instance weighting versus threshold adjusting for cost-sensitive classification

Knowledge and Information Systems
Learning valued preference structures for solving classification problems

Fuzzy Sets and Systems
A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets

Fuzzy Sets and Systems
Maximizing classifier utility when there are data acquisition and modeling costs

Data Mining and Knowledge Discovery
An experimental comparison of performance measures for classification

Pattern Recognition Letters
Evolutionary rule-based systems for imbalanced data sets

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
KEEL: a software tool to assess evolutionary algorithms for data mining problems

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets

International Journal of Approximate Reasoning
Bayes Vector Quantizer for Class-Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
Handbook of Parametric and Nonparametric Statistical Procedures

Handbook of Parametric and Nonparametric Statistical Procedures
A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability

Soft Computing - A Fusion of Foundations, Methodologies and Applications
Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting

Pattern Recognition
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
SVMs modeling for highly imbalanced classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets

Information Sciences: an International Journal
Facetwise analysis of XCS for problems with class imbalances

IEEE Transactions on Evolutionary Computation
Combating the Small Sample Class Imbalance Problem Using Feature Selection

IEEE Transactions on Knowledge and Data Engineering
An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes

Pattern Recognition
A multi-objective optimisation approach for class imbalance learning

Pattern Recognition
A dynamic over-sampling procedure based on sensitivity for multi-class problems

Pattern Recognition
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Knowledge-Based Systems
Evolutionary-based selection of generalized instances for imbalanced classification

Knowledge-Based Systems
Class imbalance methods for translation initiation site recognition in DNA sequences

Knowledge-Based Systems
Hellinger distance decision trees are robust and skew-insensitive

Data Mining and Knowledge Discovery
Simultaneous training of negatively correlated neural networks inan ensemble

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Inverse random under sampling for class imbalance problem and its application to multi-label classification

Pattern Recognition
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The imbalanced class problem is related to the real-world application of classification in engineering. It is characterised by a very different distribution of examples among the classes. The condition of multiple imbalanced classes is more restrictive when the aim of the final system is to obtain the most accurate precision for each of the concepts of the problem. The goal of this work is to provide a thorough experimental analysis that will allow us to determine the behaviour of the different approaches proposed in the specialised literature. First, we will make use of binarization schemes, i.e., one versus one and one versus all, in order to apply the standard approaches to solving binary class imbalanced problems. Second, we will apply several ad hoc procedures which have been designed for the scenario of imbalanced data-sets with multiple classes. This experimental study will include several well-known algorithms from the literature such as decision trees, support vector machines and instance-based learning, with the intention of obtaining global conclusions from different classification paradigms. The extracted findings will be supported by a statistical comparative analysis using more than 20 data-sets from the KEEL repository.