C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
A Quantitative Study of Small Disjuncts
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
An iterative method for multi-class cost-sensitive learning
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving classifier utility by altering the misclassification cost ratio
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
The class imbalance problem: A systematic study
Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Detection of stock price movements using chance discovery and genetic programming
International Journal of Knowledge-based and Intelligent Engineering Systems - Chance discovery
Estimating the utility value of individual credit card delinquents
Expert Systems with Applications: An International Journal
Decision Support Systems
Multiclass classification and gene selection with a stochastic algorithm
Computational Statistics & Data Analysis
Expert Systems with Applications: An International Journal
Compact ensemble trees for imbalanced data
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Expert Systems with Applications: An International Journal
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
A loan default discrimination model using cost-sensitive support vector machine improved by PSO
Information Technology and Management
Training and assessing classification rules with imbalanced data
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
A highly-skewed class distribution usually causes the learned classifier to predict the majority class much more often than the minority class. This is a consequence of the fact that most classifiers are designed to maximize accuracy. In many instances, such as for medical diagnosis, the minority class is the class of primary interest and hence this classification behavior is unacceptable. In this paper, we compare two basic strategies for dealing with data that has a skewed class distribution and non-uniform misclassification costs. One strategy is based on cost-sensitive learning while the other strategy employs sampling to create a more balanced class distribution in the training set. We compare two sampling techniques, up-sampling and down-sampling, to the cost-sensitive learning approach. The purpose of this paper is to determine which technique produces the best overall classifier---and under what circumstances.