Learning with Limited Minority Class Data

  • Authors:
  • Taghi M. Khoshgoftaar;Chris Seiffert;Jason Van Hulse;Amri Napolitano;Andres Folleco

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A practical problem in data mining and machine learning is the limited availability of data. For example, in a binary classification problem it is often the case that examples of one class are abundant, while examples of the other class are in short supply. Examples from one class, typically the positive class, can be limited due to the financial cost or time required to collect these examples. This work presents a comprehensive empirical study of learning when examples from one class are extremely rare, but examples of the other class(es) are plentiful. Specifically, we address the issue of how many examples from the abundant class should be used when training a classifier on data where one class is very rare. Nearly one million classifiers were built and evaluated to generate the results presented in this work. Our results demonstrate that the often used `even distribution' is not optimal when dealing with such rare events.