DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique

Authors:
Chumphol Bunkhumpornpat;Krung Sinapiromsaran;Chidchanok Lursinsap
Affiliations:
Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok, Thailand 10330;Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok, Thailand 10330;Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok, Thailand 10330
Venue:
Applied Intelligence
Year:
2012

Citing 18
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
The relationship between recall and precision

Journal of the American Society for Information Science
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Introduction to algorithms

Introduction to algorithms
A Fuzzy Diagnostic Model and Its Application in Automotive Engineering Diagnosis

Applied Intelligence
Learning When Negative Examples Abound

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A Data Mining Approach for Retailing Bank Customer Attrition Analysis

Applied Intelligence
Building a Medical Decision Support System for Colon Polyp Screening by Using Fuzzy Classification Trees

Applied Intelligence
An incremental neural learning framework and its application to vehicle diagnostics

Applied Intelligence
Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Graphs, Networks and Algorithms

Graphs, Networks and Algorithms
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection

Applied Intelligence

Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

A dataset exhibits the class imbalance problem when a target class has a very small number of instances relative to other classes. A trivial classifier typically fails to detect a minority class due to its extremely low incidence rate. In this paper, a new over-sampling technique called DBSMOTE is proposed. Our technique relies on a density-based notion of clusters and is designed to over-sample an arbitrarily shaped cluster discovered by DBSCAN. DBSMOTE generates synthetic instances along a shortest path from each positive instance to a pseudo-centroid of a minority-class cluster. Consequently, these synthetic instances are dense near this centroid and are sparse far from this centroid. Our experimental results show that DBSMOTE improves precision, F-value, and AUC more effectively than SMOTE, Borderline-SMOTE, and Safe-Level-SMOTE for imbalanced datasets.