Margin-based over-sampling method for learning from imbalanced datasets

  • Authors:
  • Xiannian Fan;Ke Tang;Thomas Weise

  • Affiliations:
  • Nature Inspired Computational and Applications Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;Nature Inspired Computational and Applications Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;Nature Inspired Computational and Applications Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei, China

  • Venue:
  • PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Learning from imbalanced datasets has drawn more and more attentions from both theoretical and practical aspects. Oversampling is a popular and simple method for imbalanced learning. In this paper, we show that there is an inherently potential risk associated with the over-sampling algorithms in terms of the large margin principle. Then we propose a new synthetic over sampling method, named Margin-guided Synthetic Over-sampling (MSYN), to reduce this risk. The MSYN improves learning with respect to the data distributions guided by the margin-based rule. Empirical study verities the efficacy of MSYN.