Selective Pre-processing of Imbalanced Data for Improving Classification Performance

Authors:
Jerzy Stefanowski;Szymon Wilk
Affiliations:
Institute of Computing Science, Poznań University of Technology, Poznań, Poland 60---965;Institute of Computing Science, Poznań University of Technology, Poznań, Poland 60---965 and Telfer School of Management, University of Ottawa, Ottawa, Canada K1N 6N5
Venue:
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Year:
2008

Citing 7
Cited 7

Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Rough Sets for Handling Imbalanced Data: Combining Filtering and Rule-based Classifiers

Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
The class imbalance problem: A systematic study

Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

Integrating selective pre-processing of imbalanced data with Ivotes ensemble

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Learning from imbalanced data in presence of noisy and borderline examples

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Data preparation techniques for improving rare class prediction

MAMECTIS/NOLASC/CONTROL/WAMUS'11 Proceedings of the 13th WSEAS international conference on mathematical methods, computational techniques and intelligent systems, and 10th WSEAS international conference on non-linear analysis, non-linear systems and chaos, and 7th WSEAS international conference on dynamical systems and control, and 11th WSEAS international conference on Wavelet analysis and multirate systems: recent researches in computational techniques, non-linear systems and control
Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers

Artificial Intelligence in Medicine
BRACID: a comprehensive approach to learning rules from imbalanced data

Journal of Intelligent Information Systems
DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Data & Knowledge Engineering
IIvotes ensemble for imbalanced data

Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we discuss problems of constructing classifiers from imbalanced data. We describe a new approach to selective pre-processing of imbalanced data which combines local over-sampling of the minority class with filtering difficult examples from the majority classes. In experiments focused on rule-based and tree-based classifiers we compare our approach with two other related pre-processing methods --- NCR and SMOTE. The results show that NCR is too strongly biased toward the minority class and leads to deteriorated specificity and overall accuracy, while SMOTE and our approach do not demonstrate such behavior. Analysis of the degree to which the original class distribution has been modified also reveals that our approach does not introduce so extensive changes as SMOTE.