Evolutionary data analysis for the class imbalance problem

Authors:
Taghi M. Khoshgoftaar;Naeem Seliya;Dennis J. Drown
Affiliations:
(Correspd. E-mail: taghi@cse.fau.edu) Computer and Electrical Engineering and Computer Science Department, Florida Atlantic University, FL, USA;Computer and Information Science Department, University of Michigan - Dearborn, MI, USA;Computer and Electrical Engineering and Computer Science Department, Florida Atlantic University, FL, USA
Venue:
Intelligent Data Analysis
Year:
2010

Citing 19
Cited 4

C4.5: programs for machine learning

C4.5: programs for machine learning
Robust Classification for Imprecise Environments

Machine Learning
Comparing Software Prediction Techniques Using Simulation

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Efficient and Accurate Parallel Genetic Algorithms

Efficient and Accurate Parallel Genetic Algorithms
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Expected Allele Coverage and the Role of Mutation in Genetic Algorithms

Proceedings of the 5th International Conference on Genetic Algorithms
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Ai Application Programming (Charles River Media Programming)

Ai Application Programming (Charles River Media Programming)
The class imbalance problem in learning classifier systems: a preliminary study

GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
Imbalanced Training Set Reduction and Feature Selection Through Genetic Optimization

Proceedings of the 2005 conference on Artificial Intelligence Research and Development
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
A proposal of evolutionary prototype selection for class imbalance problems

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning

On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Knowledge-Based Systems
An investigation on the feasibility of cross-project defect prediction

Automated Software Engineering
Addressing imbalanced classification with instance generation techniques: IPADE-ID

Neurocomputing
Evolutionary approach for automated component-based decision tree algorithm design

Intelligent Data Analysis - Business Analytics and Intelligent Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Class imbalance, where the classes in a dataset are not represented equally, is a common occurrence in machine learning. Classification models built with such datasets are often not practical since most machine learning algorithms would tend to perform poorly on the minority class instances. We present a unique evolutionary computing-based data sampling approach as an effective solution for the class imbalance problem. The genetic algorithm-based approach, Evolutionary Sampling, works as a majority undersampling technique where instances from the majority class are selectively removed. This preserves the relative integrity of the majority class while maintaining the original minority class group. Our research prototype, eVann, also implements genetic-algorithm-based optimization of modeling parameters for the machine learning algorithms considered in our study. An extensive empirical investigation involving four real-world datasets is performed, comparing the proposed approach to other existing data sampling techniques that target the class imbalance problem. Our results demonstrate that Evolutionary Sampling, both with and without learner optimization, performs relatively better than other data sampling techniques. A detailed coverage of our case studies in this paper lends itself toward empirical replication.