Improved Estimates for the Accuracy of Small Disjuncts
Machine Learning
Complexity Measures of Supervised Classification Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining and Knowledge Discovery
Feature Selection via Discretization
IEEE Transactions on Knowledge and Data Engineering
A Genetic Algorithm With Sequential Niching For Discovering Small-disjunct Rules
GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Dimensionality Reduction in Automatic Knowledge Acquisition: A Simple Greedy Search Approach
IEEE Transactions on Knowledge and Data Engineering
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Test Strategies for Cost-Sensitive Decision Trees
IEEE Transactions on Knowledge and Data Engineering
The unbalanced classification problem: detecting breaches in security
The unbalanced classification problem: detecting breaches in security
The class imbalance problem: A systematic study
Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Concept learning and the problem of small disjuncts
IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
Learning from imbalanced data in surveillance of nosocomial infection
Artificial Intelligence in Medicine
Expert Systems with Applications: An International Journal
Hi-index | 12.06 |
Classification is an important task in data mining. Class imbalance has been reported to hinder the performance of standard classification models. However, our study shows that class imbalance may not be the only cause to blame for poor performance. Rather, the underlying complexity of the problem may play a more fundamental role. In this paper, a decision tree method based on Kolmogorov-Smirnov statistic (K-S tree), is proposed to segment the training data so that a complex problem can be divided into several easier sub-problems where class imbalance becomes less challenging. K-S tree is also used to perform feature selection, which not only selects relevant variables but also removes redundant ones. After segmentation, a two-way re-sampling method is used at the segment level to empirically determine the optimal sampling percentage and the rebalanced data is used to fit logistic regression models, also at the segment level. The effectiveness of the proposed method is demonstrated through its application on property refinance prediction.