Does cost-sensitive learning beat sampling for classifying rare classes?

Authors:
Kate McCarthy;Bibi Zabar;Gary Weiss
Affiliations:
Fordham University, Bronx, NY;Fordham University, Bronx, NY;Fordham University, Bronx, NY
Venue:
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Year:
2005

Citing 9
Cited 11

C4.5: programs for machine learning

C4.5: programs for machine learning
Induction of Decision Trees

Machine Learning
A Quantitative Study of Small Disjuncts

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
An iterative method for multi-class cost-sensitive learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving classifier utility by altering the misclassification cost ratio

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
The class imbalance problem: A systematic study

Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Detection of stock price movements using chance discovery and genetic programming

International Journal of Knowledge-based and Intelligent Engineering Systems - Chance discovery
Estimating the utility value of individual credit card delinquents

Expert Systems with Applications: An International Journal
Principal-agent learning

Decision Support Systems
Multiclass classification and gene selection with a stochastic algorithm

Computational Statistics & Data Analysis
Determining the optimal re-sampling strategy for a classification model with imbalanced data using design of experiments and response surface methodologies

Expert Systems with Applications: An International Journal
Compact ensemble trees for imbalanced data

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Classification cost: An empirical comparison among traditional classifier, Cost-Sensitive Classifier, and MetaCost

Expert Systems with Applications: An International Journal
A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems

Neurocomputing
Class imbalance and the curse of minority hubs

Knowledge-Based Systems
A loan default discrimination model using cost-sensitive support vector machine improved by PSO

Information Technology and Management
Training and assessing classification rules with imbalanced data

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

A highly-skewed class distribution usually causes the learned classifier to predict the majority class much more often than the minority class. This is a consequence of the fact that most classifiers are designed to maximize accuracy. In many instances, such as for medical diagnosis, the minority class is the class of primary interest and hence this classification behavior is unacceptable. In this paper, we compare two basic strategies for dealing with data that has a skewed class distribution and non-uniform misclassification costs. One strategy is based on cost-sensitive learning while the other strategy employs sampling to create a more balanced class distribution in the training set. We compare two sampling techniques, up-sampling and down-sampling, to the cost-sensitive learning approach. The purpose of this paper is to determine which technique produces the best overall classifier---and under what circumstances.