Borderline over-sampling for imbalanced data classification

Authors:
Hien M. Nguyen;Eric W. Cooper;Katsuari Kamei
Affiliations:
Graduate School of Science and Engineering, Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, Japan.;College of Information Science and Engineering, Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, Japan.;College of Information Science and Engineering, Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, Japan
Venue:
International Journal of Knowledge Engineering and Soft Data Paradigms
Year:
2011

Citing 22
Cited 1

Bagging predictors

Machine Learning
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution

IEEE Transactions on Knowledge and Data Engineering
Learning concepts from large scale imbalanced data sets using support cluster machines

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Exploratory Under-Sampling for Class-Imbalance Learning

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Local decomposition for rare class analysis

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
The class imbalance problem: A systematic study

Intelligent Data Analysis
Automatically countering imbalance and its empirical relationship to cost

Data Mining and Knowledge Discovery
Lazy Bagging for Classifying Imbalanced Data

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Cluster-based under-sampling approaches for imbalanced data distributions

Expert Systems with Applications: An International Journal
Start Globally, Optimize Locally, Predict Globally: Improving Performance on Imbalanced Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
A brief introduction to boosting

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Boosting prediction accuracy on imbalanced datasets with SVM ensembles

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I

GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional classification algorithms usually provide poor accuracy on the prediction of the minority class of imbalanced data sets. This paper proposes a new method for dealing with imbalanced data sets by over-sampling the borderline minority class instances. A Support Vector Machine (SVM) classifier is then trained to predict future instances. Compared with other over-sampling methods, the proposed method focuses only on the minority class instances residing along the decision boundary, due to the fact that this region is the most crucial for establishing the decision boundary. Furthermore, the artificial minority instances are generated in such a way that the regions of the minority class with fewer majority class instances would be expanded by extrapolation, otherwise the current boundary of the minority class would be consolidated by interpolation. Experimental results show that the proposed method achieves a better performance than other over-sampling methods.