Cost-Based Sampling of Individual Instances

  • Authors:
  • William Klement;Peter Flach;Nathalie Japkowicz;Stan Matwin

  • Affiliations:
  • School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada K1N 6N5;Department of Computer Science, University of Bristol, Bristol, United Kingdom BS8 1UB;School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada K1N 6N5;School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada K1N 6N5 and Institute of Computer Science, Polish Academy of Sciences, Poland

  • Venue:
  • Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many practical domains, misclassification costs can differ greatly and may be represented by class ratios, however, most learning algorithms struggle with skewed class distributions. The difficulty is attributed to designing classifiers to maximize the accuracy. Researchers call for using several techniques to address this problem including; under-sampling the majority class, employing a probabilistic algorithm, and adjusting the classification threshold. In this paper, we propose a general sampling approach that assigns weights to individual instances according to the cost function. This approach helps reveal the relationship between classification performance and class ratios and allows the identification of an appropriate class distribution for which, the learning method achieves a reasonable performance on the data. Our results show that combining an ensemble of Naive Bayes classifiers with threshold selection and under-sampling techniques works well for imbalanced data.