Margin calibration in SVM class-imbalanced learning

  • Authors:
  • Chan-Yun Yang;Jr-Syu Yang;Jian-Jun Wang

  • Affiliations:
  • Department of Mechanical Engineering, Technology and Science Institute of Northern Taiwan, No. 2 Xue-Yuan Road, Beitou, Taipei 11202, Taiwan, ROC;Department of Mechanical and Electro-Mechanical Engineering, Tamkang University, No. 151 Ying-Chuan Road, Tamsui, Taipei County 25137, Taiwan, ROC;School of Mathematics and Statistics, Southwest University, Chongqing 400715, PR China

  • Venue:
  • Neurocomputing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Imbalanced dataset learning is an important practical issue in machine learning, even in support vector machines (SVMs). In this study, a well known reference model for solving the problem proposed by Veropoulos et al., is first studied. From the aspect of loss function, the reference cost sensitive prototype is identified as a penalty-regularized model. Intuitively, the loss function can change not only the penalty but also the margin to recover the biased decision boundary. This study focuses mainly on the effect from the margin and then extends the model to a more general modification. As proposed in the prototype, the modification first adopts an inversed proportional regularized penalty to re-weight the imbalanced classes. In addition to the penalty regularization, the modification then employs a margin compensation to lead the margin to be lopsided, which enables the decision boundary drift. Two regularization factors, the penalty and margin, are hence suggested for achieving an unbiased classification. The margin compensation, associating with the penalty regularization, is here utilized to calibrate and refine the biased decision boundary to further reduce the bias. With the area under the receiver operating characteristic curve (AuROC) for examining the performance, the modification shows relative higher scores than the reference model, even though the optimal performance is achieved by the reference model. Some useful characteristics found empirically are also included, which may be convenient for the future applications. All the theoretical descriptions and experimental validations show the proposed model's potential to compete for highly unbiased accuracy in a complex imbalanced dataset.