Handling imbalanced data sets with a modification of Decorate algorithm

Authors:
Sotiris B. Kotsiantis
Affiliations:
Educational Software Development Laboratory, Department of Mathematics, University of Patras, Rio 26500, Greece
Venue:
International Journal of Computer Applications in Technology
Year:
2008

Citing 21
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Machine Learning
Robust Classification for Imprecise Environments

Machine Learning
Bayesian Networks and Decision Graphs

Bayesian Networks and Decision Graphs
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
Distributed Data Mining in Credit Card Fraud Detection

IEEE Intelligent Systems
Increasing sensitivity of preterm birth by changing rule strengths

Pattern Recognition Letters - Special issue: Rough sets, pattern recognition and data mining
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Applying One-Sided Selection to Unbalanced Datasets

MICAI '00 Proceedings of the Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
A Mixture-of-Experts Framework for Learning from Imbalanced Data Sets

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The class imbalance problem: A systematic study

Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Constructing diverse classifier ensembles using artificial training examples

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many real-world data sets exhibit skewed class distributions in which almost all instances are allotted to a class and far fewer instances to a smaller, but usually more interesting class. A classifier induced from an imbalanced data set has, characteristically, a low error rate for the majority class and an undesirable error rate for the minority class. This paper firstly provides a systematic study on the various methodologies that have tried to handle this problem. Finally, it presents an experimental study of these methodologies with a modification of Decorate algorithm and it concludes that such a framework can be a more valuable solution to the problem. Our method seems to permit improved identification of difficult small classes in predictive analysis, while keeping the classification ability of the majority class in an acceptable level.