A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
C4.5: programs for machine learning
C4.5: programs for machine learning
Geometry and invariance in kernel based methods
Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Aligning Boundary in Kernel Space for Learning Imbalanced Dataset
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
Active learning for class imbalance problem
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
An Efficient Algorithm for Multi-class Support Vector Machines
ICACTE '08 Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering
Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions
IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
SVMs modeling for highly imbalanced classification
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Boosting support vector machines for imbalanced data sets
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
A Fuzzy Support Vector Machine for Imbalanced Data Classification
ICOIP '10 Proceedings of the 2010 International Conference on Optoelectronics and Image Processing - Volume 01
Combining integrated sampling with SVM ensembles for learning from imbalanced datasets
Information Processing and Management: an International Journal
Asymmetric Kernel scaling for imbalanced data classification
WILF'11 Proceedings of the 9th international conference on Fuzzy logic and applications
Scaling the kernel function to improve performance of the support vector machine
ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part I
z-SVM: an SVM for improved classification of imbalanced data
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Beyond accuracy, f-score and ROC: a family of discriminant measures for performance evaluation
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Hi-index | 0.07 |
Rare events are involved in many challenging real world classification problems, where the minority class is usually the most expensive to sample and to label. As a consequence, training data are often imbalanced, presenting an heavily skewed distribution of labels. Using conventional classification techniques produces biased results, as the classifier may easily show a very good performance on the over-represented class and a very poor performance on the under-represented class: the former dominates the learning process and tends to attract all predictions. Furthermore, the classical accuracy measure is misleading, as it assumes equal importance for the true positives and the true negatives. We propose a classification procedure based on Support Vector Machine able to effectively cope with data imbalance. Using a first step approximate solution and then a suitable kernel transformation, we enlarge asymmetrically space around the class boundary, compensating data skewness. We also propose an accuracy measure, named AGF, that properly accounts for the different misclassification costs of the two classes. Tests on real world data from a public repository show that the proposed approach outperforms its competitors.