C4.5: programs for machine learning
C4.5: programs for machine learning
Robust Classification for Imprecise Environments
Machine Learning
Comparing Software Prediction Techniques Using Simulation
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Efficient and Accurate Parallel Genetic Algorithms
Efficient and Accurate Parallel Genetic Algorithms
Data Mining and Knowledge Discovery with Evolutionary Algorithms
Data Mining and Knowledge Discovery with Evolutionary Algorithms
The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Expected Allele Coverage and the Role of Mutation in Genetic Algorithms
Proceedings of the 5th International Conference on Genetic Algorithms
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Ai Application Programming (Charles River Media Programming)
Ai Application Programming (Charles River Media Programming)
The class imbalance problem in learning classifier systems: a preliminary study
GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
Imbalanced Training Set Reduction and Feature Selection Through Genetic Optimization
Proceedings of the 2005 conference on Artificial Intelligence Research and Development
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
A proposal of evolutionary prototype selection for class imbalance problems
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
An investigation on the feasibility of cross-project defect prediction
Automated Software Engineering
Evolutionary approach for automated component-based decision tree algorithm design
Intelligent Data Analysis - Business Analytics and Intelligent Optimization
Hi-index | 0.00 |
Class imbalance, where the classes in a dataset are not represented equally, is a common occurrence in machine learning. Classification models built with such datasets are often not practical since most machine learning algorithms would tend to perform poorly on the minority class instances. We present a unique evolutionary computing-based data sampling approach as an effective solution for the class imbalance problem. The genetic algorithm-based approach, Evolutionary Sampling, works as a majority undersampling technique where instances from the majority class are selectively removed. This preserves the relative integrity of the majority class while maintaining the original minority class group. Our research prototype, eVann, also implements genetic-algorithm-based optimization of modeling parameters for the machine learning algorithms considered in our study. An extensive empirical investigation involving four real-world datasets is performed, comparing the proposed approach to other existing data sampling techniques that target the class imbalance problem. Our results demonstrate that Evolutionary Sampling, both with and without learner optimization, performs relatively better than other data sampling techniques. A detailed coverage of our case studies in this paper lends itself toward empirical replication.