Addressing the classification with imbalanced data: open problems and new challenges on class distribution

Authors:
A. Fernández;S. García;F. Herrera
Affiliations:
Dept. of Computer Science, University of Jaén;Dept. of Computer Science and A.I., University of Granada;Dept. of Computer Science, University of Jaén
Venue:
HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Year:
2011

Citing 41
Cited 3

Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners

IEEE Transactions on Pattern Analysis and Machine Intelligence
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning and making decisions when costs and probabilities are both unknown

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Complexity Measures of Supervised Classification Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modern Information Retrieval

Modern Information Retrieval
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Decision trees with minimal costs

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
The class imbalance problem: A systematic study

Intelligent Data Analysis
Feature extraction for classification problems and its application to face recognition

Pattern Recognition
2008 Special Issue: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance

Neural Networks
2008 Special Issue: Robust BMPM training based on second-order cone programming and its application in medical diagnosis

Neural Networks
A memetic algorithm for evolutionary prototype selection: A scaling up approach

Pattern Recognition
A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets

Fuzzy Sets and Systems
Maximizing classifier utility when there are data acquisition and modeling costs

Data Mining and Knowledge Discovery
On the k-NN performance in a challenging scenario of imbalance and overlapping

Pattern Analysis & Applications - Special Issue: Non-parametric distance-based classification techniques and their applications
Evolutionary rule-based systems for imbalanced data sets

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Dataset Shift in Machine Learning

Dataset Shift in Machine Learning
A framework for monitoring classifiers’ performance: when and why failure occurs?

Knowledge and Information Systems
Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets

International Journal of Approximate Reasoning
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems

Applied Soft Computing
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy

Evolutionary Computation
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets

Information Sciences: an International Journal
Facetwise analysis of XCS for problems with class imbalances

IEEE Transactions on Evolutionary Computation
Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets

Soft Computing - A Fusion of Foundations, Methodologies and Applications
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Discriminative Learning Under Covariate Shift

The Journal of Machine Learning Research
Combating the Small Sample Class Imbalance Problem Using Feature Selection

IEEE Transactions on Knowledge and Data Engineering
Learning from imbalanced data in presence of noisy and borderline examples

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

IEEE Transactions on Evolutionary Computation
Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Intelligent Systems, Design and Applications (ISDA 2009)
Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis

Information Sciences: an International Journal

Identification of different types of minority class examples in imbalanced data

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets

Knowledge-Based Systems
Class imbalance and the curse of minority hubs

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classifier learning with datasets which suffer from imbalanced class distributions is an important problem in data mining. This issue occurs when the number of examples representing one class is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers. The aim of this work is to shortly review the main issues of this problem and to describe two common approaches for dealing with imbalance, namely sampling and cost sensitive learning. Additionally, we will pay special attention to some open problems, in particular we will carry out a discussion on the data intrinsic characteristics of the imbalanced classification problem which will help to follow new paths that can lead to the improvement of current models, namely size of the dataset, small disjuncts, the overlapping between the classes and the data fracture between training and test distribution.