Adjusted F-measure and kernel scaling for imbalanced data learning

Authors:
Antonio Maratea;Alfredo Petrosino;Mario Manzo
Affiliations:
-;-;-
Venue:
Information Sciences: an International Journal
Year:
2014

Citing 23
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
C4.5: programs for machine learning

C4.5: programs for machine learning
Geometry and invariance in kernel based methods

Advances in kernel methods
Improving support vector machine classifiers by modifying kernal functions

Neural Networks
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Aligning Boundary in Kernel Space for Learning Imbalanced Dataset

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution

IEEE Transactions on Knowledge and Data Engineering
Active learning for class imbalance problem

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
An Efficient Algorithm for Multi-class Support Vector Machines

ICACTE '08 Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering
Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions

IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Meta-learning for imbalanced data and classification ensemble in binary classification

Neurocomputing
SVMs modeling for highly imbalanced classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Boosting support vector machines for imbalanced data sets

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
A Fuzzy Support Vector Machine for Imbalanced Data Classification

ICOIP '10 Proceedings of the 2010 International Conference on Optoelectronics and Image Processing - Volume 01
Combining integrated sampling with SVM ensembles for learning from imbalanced datasets

Information Processing and Management: an International Journal
Asymmetric Kernel scaling for imbalanced data classification

WILF'11 Proceedings of the 9th international conference on Fuzzy logic and applications
Scaling the kernel function to improve performance of the support vector machine

ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part I
z-SVM: an SVM for improved classification of imbalanced data

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Beyond accuracy, f-score and ROC: a family of discriminant measures for performance evaluation

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence

Quantified Score

Hi-index	0.07

Visualization

Abstract

Rare events are involved in many challenging real world classification problems, where the minority class is usually the most expensive to sample and to label. As a consequence, training data are often imbalanced, presenting an heavily skewed distribution of labels. Using conventional classification techniques produces biased results, as the classifier may easily show a very good performance on the over-represented class and a very poor performance on the under-represented class: the former dominates the learning process and tends to attract all predictions. Furthermore, the classical accuracy measure is misleading, as it assumes equal importance for the true positives and the true negatives. We propose a classification procedure based on Support Vector Machine able to effectively cope with data imbalance. Using a first step approximate solution and then a suitable kernel transformation, we enlarge asymmetrically space around the class boundary, compensating data skewness. We also propose an accuracy measure, named AGF, that properly accounts for the different misclassification costs of the two classes. Tests on real world data from a public repository show that the proposed approach outperforms its competitors.