An empirical study of the behavior of classifiers on imbalanced and overlapped data sets

Authors:
Vicente García;Jose Sánchez;Ramon Mollineda
Affiliations:
Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, México and Dept. Llenguatges i Sistemes Informátics, Universitat Jaume I, Castelló de la Plana, Spain;Dept. Llenguatges i Sistemes Informátics, Universitat Jaume I, Castelló de la Plana, Spain;Dept. Llenguatges i Sistemes Informátics, Universitat Jaume I, Castelló de la Plana, Spain
Venue:
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Year:
2007

Citing 12
Cited 7

C4.5: programs for machine learning

C4.5: programs for machine learning
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Radial Basis Functions

Radial Basis Functions
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The class imbalance problem in learning classifier systems: a preliminary study

GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics)

Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The class imbalance problem: A systematic study

Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Balancing strategies and class overlapping

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis

Learning from imbalanced data in presence of noisy and borderline examples

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Overlap versus imbalance

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers

Artificial Intelligence in Medicine
Identification of different types of minority class examples in imbalanced data

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
BRACID: a comprehensive approach to learning rules from imbalanced data

Journal of Intelligent Information Systems
DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Data & Knowledge Engineering
IIvotes ensemble for imbalanced data

Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Class imbalance has been reported as an important obstacle to apply traditional learning algorithms to real-world domains. Recent investigations have questioned whether the imbalance is the unique factor that hinders the performance of classifiers. In this paper, we study the behavior of six algorithms when classifying imbalanced, overlapped data sets under uncommon situations (e.g., when the overall imbalance ratio is different from the local imbalance ratio in the overlap region). This is accomplished by analyzing the accuracy on each individual class, thus devising how those situations affect the majority and minority classes. The experiments corroborate that overlap is more important than imbalance for the classification performance. Also, they show that the classifiers behave differently depending on the nature of each model.