Margin calibration in SVM class-imbalanced learning

Authors:
Chan-Yun Yang;Jr-Syu Yang;Jian-Jun Wang
Affiliations:
Department of Mechanical Engineering, Technology and Science Institute of Northern Taiwan, No. 2 Xue-Yuan Road, Beitou, Taipei 11202, Taiwan, ROC;Department of Mechanical and Electro-Mechanical Engineering, Tamkang University, No. 151 Ying-Chuan Road, Tamsui, Taipei County 25137, Taiwan, ROC;School of Mathematics and Statistics, Southwest University, Chongqing 400715, PR China
Venue:
Neurocomputing
Year:
2009

Citing 23
Cited 5

The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Genetic algorithms + data structures = evolution programs (3rd ed.)

Genetic algorithms + data structures = evolution programs (3rd ed.)
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing classifiers for imbalanced training sets

Proceedings of the 1998 conference on Advances in neural information processing systems II
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Conformal Transformation of Kernel Functions: A Data-Dependent Way to Improve Support Vector Machine Classifiers

Neural Processing Letters
An Instance-Weighting Method to Induce Cost-Sensitive Trees

IEEE Transactions on Knowledge and Data Engineering
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A block-based support vector machine approach to the protein homology prediction task in KDD Cup 2004

ACM SIGKDD Explorations Newsletter
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution

IEEE Transactions on Knowledge and Data Engineering
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Active learning for class imbalance problem

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Highlighting heterogeneous samples to support vector machines' training

Neurocomputing
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Response modeling with support vector machines

Expert Systems with Applications: An International Journal
A comparative evaluation approach for the classification of rotifers with modified non-parametric kNN

Image and Vision Computing
Letters: Support vector machines for candidate nodules classification

Neurocomputing
Consistency of support vector machines and other regularized kernel classifiers

IEEE Transactions on Information Theory
Fuzzy support vector machines

IEEE Transactions on Neural Networks
Posterior probability support vector Machines for unbalanced data

IEEE Transactions on Neural Networks

Shared domains of competence of approximate learning models using measures of separability of classes

Information Sciences: an International Journal
Automatic multi-modal intelligent seizure acquisition (MISA) system for detection of motor seizures from electromyographic data and motion data

Computer Methods and Programs in Biomedicine
Learning SVM with weighted maximum margin criterion for classification of imbalanced data

Mathematical and Computer Modelling: An International Journal
Robust classifier learning with fuzzy class labels for large-margin support vector machines

Neurocomputing
Ensemble of online neural networks for non-stationary and imbalanced data streams

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Imbalanced dataset learning is an important practical issue in machine learning, even in support vector machines (SVMs). In this study, a well known reference model for solving the problem proposed by Veropoulos et al., is first studied. From the aspect of loss function, the reference cost sensitive prototype is identified as a penalty-regularized model. Intuitively, the loss function can change not only the penalty but also the margin to recover the biased decision boundary. This study focuses mainly on the effect from the margin and then extends the model to a more general modification. As proposed in the prototype, the modification first adopts an inversed proportional regularized penalty to re-weight the imbalanced classes. In addition to the penalty regularization, the modification then employs a margin compensation to lead the margin to be lopsided, which enables the decision boundary drift. Two regularization factors, the penalty and margin, are hence suggested for achieving an unbiased classification. The margin compensation, associating with the penalty regularization, is here utilized to calibrate and refine the biased decision boundary to further reduce the bias. With the area under the receiver operating characteristic curve (AuROC) for examining the performance, the modification shows relative higher scores than the reference model, even though the optimal performance is achieved by the reference model. Some useful characteristics found empirically are also included, which may be convenient for the future applications. All the theoretical descriptions and experimental validations show the proposed model's potential to compete for highly unbiased accuracy in a complex imbalanced dataset.