Support Vector Machines for Classification in Nonstandard Situations
Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Exploratory Under-Sampling for Class-Imbalance Learning
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Classifying imbalanced data using a bagging ensemble variation (BEV)
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Cluster-Based sampling approaches to imbalanced data distributions
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Hi-index | 0.01 |
The approach of under-sampling the majority class is an effective method in dealing with classifying imbalanced data sets, but it has the deficiency of ignoring useful information. In order to eliminate this deficiency, we propose a Clustering Based Bagging Algorithm (CBBA). In CBBA, the majority class is clustered into several groups and instances are randomly sampled from each group. Those sampled instances are combined together with the minority class instances, and are used to train a base classifier. Final predictions are produced by combining those classifiers. The experimental results show that our approach outperforms the under-sampling method.