Skewed Class Distributions and Mislabeled Examples

  • Authors:
  • Jason Van Hulse;Taghi M. Khoshgoftaar;Amri Napolitano

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Both imbalanced data and class noise are problems which have received attention in data mining research, how- ever learning from imbalanced data with labeling errors has not been adequately addressed. We present system- atic experimentation on imbalanced datasets with simulated class noise and evaluate the impact on various classifica- tion algorithms. Our results show that class noise is a sig- nificant detriment to learning from skewed data, but more importantly, we demonstrate that the class in which the noise is located is critical. This has significant repercus- sions for noise treatment procedures, which often handle noise equally in both classes. In addition, an examination of 11 classifiers demonstrates that the learners react very differently when confronted with class noise.