A combined approach to tackle imbalanced data sets

Authors:
B. K. Sarkar;S. S. Sana;K. S. Chaudhuri
Affiliations:
Department of Information Technology, Birla Institute of Technology, Mesra, Ranchi, India;Department of Mathematics, Bhangar Mahavidyalaya C.U., Bhangar, India;Department of Mathematics, Jadavpur University, Kolkata, India
Venue:
International Journal of Hybrid Intelligent Systems
Year:
2012

Citing 21
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Machine Learning

Machine Learning
Accuracy-based learning classifier systems: models, analysis and applications to classification tasks

Evolutionary Computation
The effect of small disjuncts and class distribution on decision tree learning

The effect of small disjuncts and class distribution on decision tree learning
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A hybrid decision tree/genetic algorithm method for data mining

Information Sciences: an International Journal - Special issue: Soft computing data mining
A Study of Structural and Parametric Learning in XCS

Evolutionary Computation
Bounding XCS's parameters for unbalanced datasets

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Automated global structure extraction for effective local building block processing in XCS

Evolutionary Computation
The effect of imbalanced data sets on LDA: A theoretical and empirical analysis

Pattern Recognition
The class imbalance problem: A systematic study

Intelligent Data Analysis
Evolutionary rule-based systems for imbalanced data sets

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
The Role of Biomedical Dataset in Classification

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning from imbalanced data in surveillance of nosocomial infection

Artificial Intelligence in Medicine
FSVM-CIL: fuzzy support vector machines for class imbalance learning

IEEE Transactions on Fuzzy Systems - Special section on computing with words
A proposal of evolutionary prototype selection for class imbalance problems

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Imbalanced learning with a biased minimax probability machine

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

A ranking-based algorithm for detection of outliers in categorical data

International Journal of Hybrid Intelligent Systems
Applications of Hybrid Extreme Rotation Forests for image segmentation

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning with imbalanced data causes high error-rates. Several approaches have been developed for addressing this problem. In this paper, a new learning model, integrating the C4.5 classifier and evolutionary algorithms, is introduced. To strengthen the model, two separate partitioning data sets are chosen for each original data set, by applying two distinct partitioning schemes proposed in this investigation, and these are used in sequence by the learning model. More specifically, the hybrid system first applies the base method C4.5 to produce a set of rules R from a training set say T_1, as constructed by the first data partitioning scheme. The R is then passed to the Genetic Algorithm to discover another set of rules say R_{GA} from another disjoint training set say T_2. T_2 is decided by the proposed second partitioning method. Finally, some informative rules of R_{GA} are included into R. The presented system is tested on several real data sets collected from the UCI machine learning repository and compared with standard C4.5. Experimental results show the good suitability of the system on imbalanced data sets. However, the model does not show negative performance on balanced data sets too.