The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
Genetic algorithms + data structures = evolution programs (3rd ed.)
Genetic algorithms + data structures = evolution programs (3rd ed.)
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing classifiers for imbalanced training sets
Proceedings of the 1998 conference on Advances in neural information processing systems II
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
An Instance-Weighting Method to Induce Cost-Sensitive Trees
IEEE Transactions on Knowledge and Data Engineering
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
ACM SIGKDD Explorations Newsletter
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Active learning for class imbalance problem
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Response modeling with support vector machines
Expert Systems with Applications: An International Journal
Image and Vision Computing
Consistency of support vector machines and other regularized kernel classifiers
IEEE Transactions on Information Theory
IEEE Transactions on Neural Networks
Posterior probability support vector Machines for unbalanced data
IEEE Transactions on Neural Networks
Information Sciences: an International Journal
Computer Methods and Programs in Biomedicine
Learning SVM with weighted maximum margin criterion for classification of imbalanced data
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.01 |
Imbalanced dataset learning is an important practical issue in machine learning, even in support vector machines (SVMs). In this study, a well known reference model for solving the problem proposed by Veropoulos et al., is first studied. From the aspect of loss function, the reference cost sensitive prototype is identified as a penalty-regularized model. Intuitively, the loss function can change not only the penalty but also the margin to recover the biased decision boundary. This study focuses mainly on the effect from the margin and then extends the model to a more general modification. As proposed in the prototype, the modification first adopts an inversed proportional regularized penalty to re-weight the imbalanced classes. In addition to the penalty regularization, the modification then employs a margin compensation to lead the margin to be lopsided, which enables the decision boundary drift. Two regularization factors, the penalty and margin, are hence suggested for achieving an unbiased classification. The margin compensation, associating with the penalty regularization, is here utilized to calibrate and refine the biased decision boundary to further reduce the bias. With the area under the receiver operating characteristic curve (AuROC) for examining the performance, the modification shows relative higher scores than the reference model, even though the optimal performance is achieved by the reference model. Some useful characteristics found empirically are also included, which may be convenient for the future applications. All the theoretical descriptions and experimental validations show the proposed model's potential to compete for highly unbiased accuracy in a complex imbalanced dataset.