Information Retrieval
Data Mining and Machine Oriented Modeling: A Granular Computing Approach
Applied Intelligence
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
Efficient support vector classifiers for named entity recognition
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
SVM in oracle database 10g: removing the barriers to widespread adoption of support vector machines
VLDB '05 Proceedings of the 31st international conference on Very large data bases
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
The class imbalance problem: A systematic study
Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Granular support vector machines with association rules mining for protein homology prediction
Artificial Intelligence in Medicine
ICAPR'05 Proceedings of the Third international conference on Pattern Recognition and Image Analysis - Volume Part II
Imbalanced learning with a biased minimax probability machine
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A Kernel-Based Two-Class Classifier for Imbalanced Data Sets
IEEE Transactions on Neural Networks
Overlap-Based Similarity Metrics for Motif Search in DNA Sequences
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Feature selection and granular SVM classification for protein arginine methylation identification
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
An asymmetric classifier based on partial least squares
Pattern Recognition
Empirical system learning for statistical pattern recognition with non-uniform error criteria
IEEE Transactions on Signal Processing
Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Effective recognition of MCCs in mammograms using an improved neural classifier
Engineering Applications of Artificial Intelligence
Learning to rank for why-question answering
Information Retrieval
Balance support vector machines locally using the structural similarity kernel
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A simplified multi-class support vector machine with reduced dual optimization
Pattern Recognition Letters
ANN vs. SVM: Which one performs better in classification of MCCs in mammogram imaging
Knowledge-Based Systems
"I loan because...": understanding motivations for pro-social lending
Proceedings of the fifth ACM international conference on Web search and data mining
A novel algorithm applied to classify unbalanced data
Applied Soft Computing
Prediction of candidate genes for neuropsychiatric disorders using feature-based enrichment
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Computers in Biology and Medicine
Failure prediction based on log files using Random Indexing and Support Vector Machines
Journal of Systems and Software
A Multi-Expert System for chlorine electrolyzer monitoring
Expert Systems with Applications: An International Journal
A hybrid PSO-FSVM model and its application to imbalanced classification of mammograms
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Feature selection for high-dimensional imbalanced data
Neurocomputing
Training Lp norm multiple kernel learning in the primal
Neural Networks
Variance inflation in high dimensional Support Vector Machines
Pattern Recognition Letters
Information Sciences: an International Journal
Adjusted F-measure and kernel scaling for imbalanced data learning
Information Sciences: an International Journal
SR-NBS: A fast sparse representation based N-best class selector for robust phoneme classification
Engineering Applications of Artificial Intelligence
Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning
Neural Processing Letters
GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems
Applied Soft Computing
Hi-index | 0.01 |
Traditional classification algorithms can be limited in their performance on highly unbalanced data sets. A popular stream of work for countering the problem of class imbalance has been the application of a sundry of sampling strategies. In this correspondence, we focus on designing modifications to support vector machines (SVMs) to appropriately tackle the problem of class imbalance. We incorporate different "rebalance" heuristics in SVM modeling, including cost-sensitive learning, and over- and undersampling. These SVM-based strategies are compared with various state-of-the-art approaches on a variety of data sets by using various metrics, including G-mean, area under the receiver operating characteristic curve, F-measure, and area under the precision/recall curve. We show that we are able to surpass or match the previously known best algorithms on each data set. In particular, of the four SVM variations considered in this correspondence, the novel granular SVMs-repetitive undersampling algorithm (GSVM-RU) is the best in terms of both effectiveness and efficiency. GSVM-RU is effective, as it can minimize the negative effect of information loss while maximizing the positive effect of data cleaning in the undersampling process. GSVM-RU is efficient by extracting much less support vectors and, hence, greatly speeding up SVM prediction.