SVMs modeling for highly imbalanced classification

Authors:
Yuchun Tang;Yan-Qing Zhang;Nitesh V. Chawla;Sven Krasser
Affiliations:
McAfee Inc., Alpharetta, GA;Department of Computer Science, Georgia State University, Atlanta, GA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN;McAfee Inc., Alpharetta, GA
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Year:
2009

Citing 18
Cited 32

Information Retrieval

Information Retrieval
Data Mining and Machine Oriented Modeling: A Granular Computing Approach

Applied Intelligence
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution

IEEE Transactions on Knowledge and Data Engineering
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
SVM in oracle database 10g: removing the barriers to widespread adoption of support vector machines

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The relationship between Precision-Recall and ROC curves

ICML '06 Proceedings of the 23rd international conference on Machine learning
The class imbalance problem: A systematic study

Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Granular support vector machines with association rules mining for protein homology prediction

Artificial Intelligence in Medicine
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Experiments with SVM and stratified sampling with an imbalanced problem: detection of intestinal contractions

ICAPR'05 Proceedings of the Third international conference on Pattern Recognition and Image Analysis - Volume Part II
Imbalanced learning with a biased minimax probability machine

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A Kernel-Based Two-Class Classifier for Imbalanced Data Sets

IEEE Transactions on Neural Networks

Overlap-Based Similarity Metrics for Motif Search in DNA Sequences

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Feature selection and granular SVM classification for protein arginine methylation identification

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
An asymmetric classifier based on partial least squares

Pattern Recognition
Empirical system learning for statistical pattern recognition with non-uniform error criteria

IEEE Transactions on Signal Processing
Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification

Speech Communication
Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization

Expert Systems with Applications: An International Journal
A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function

Expert Systems with Applications: An International Journal
Effective recognition of MCCs in mammograms using an improved neural classifier

Engineering Applications of Artificial Intelligence
Learning to rank for why-question answering

Information Retrieval
An Application of Artificial Immune Recognition System for Prediction of Diabetes Following Gestational Diabetes

Journal of Medical Systems
Balance support vector machines locally using the structural similarity kernel

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A simplified multi-class support vector machine with reduced dual optimization

Pattern Recognition Letters
ANN vs. SVM: Which one performs better in classification of MCCs in mammogram imaging

Knowledge-Based Systems
"I loan because...": understanding motivations for pro-social lending

Proceedings of the fifth ACM international conference on Web search and data mining
A novel algorithm applied to classify unbalanced data

Applied Soft Computing
Prediction of candidate genes for neuropsychiatric disorders using feature-based enrichment

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Prediction of flavin mono-nucleotide binding sites using modified PSSM profile and ensemble support vector machine

Computers in Biology and Medicine
Failure prediction based on log files using Random Indexing and Support Vector Machines

Journal of Systems and Software
Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

Knowledge-Based Systems
A Multi-Expert System for chlorine electrolyzer monitoring

Expert Systems with Applications: An International Journal
A hybrid PSO-FSVM model and its application to imbalanced classification of mammograms

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Feature selection for high-dimensional imbalanced data

Neurocomputing
Certainty-based active learning for sampling imbalanced datasets

Neurocomputing
Training Lp norm multiple kernel learning in the primal

Neural Networks
Variance inflation in high dimensional Support Vector Machines

Pattern Recognition Letters
Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients

Applied Soft Computing
On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed

Information Sciences: an International Journal
Adjusted F-measure and kernel scaling for imbalanced data learning

Information Sciences: an International Journal
SR-NBS: A fast sparse representation based N-best class selector for robust phoneme classification

Engineering Applications of Artificial Intelligence
Predicting minority class for suspended particulate matters level by extreme learning machine

Neurocomputing
Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning

Neural Processing Letters
GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems

Applied Soft Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Traditional classification algorithms can be limited in their performance on highly unbalanced data sets. A popular stream of work for countering the problem of class imbalance has been the application of a sundry of sampling strategies. In this correspondence, we focus on designing modifications to support vector machines (SVMs) to appropriately tackle the problem of class imbalance. We incorporate different "rebalance" heuristics in SVM modeling, including cost-sensitive learning, and over- and undersampling. These SVM-based strategies are compared with various state-of-the-art approaches on a variety of data sets by using various metrics, including G-mean, area under the receiver operating characteristic curve, F-measure, and area under the precision/recall curve. We show that we are able to surpass or match the previously known best algorithms on each data set. In particular, of the four SVM variations considered in this correspondence, the novel granular SVMs-repetitive undersampling algorithm (GSVM-RU) is the best in terms of both effectiveness and efficiency. GSVM-RU is effective, as it can minimize the negative effect of information loss while maximizing the positive effect of data cleaning in the undersampling process. GSVM-RU is efficient by extracting much less support vectors and, hence, greatly speeding up SVM prediction.