Machine Learning
A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
The class imbalance problem: A systematic study
Intelligent Data Analysis
A dual coordinate descent method for large-scale linear SVM
Proceedings of the 25th international conference on Machine learning
Bioinformatics
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Roughly balanced bagging for imbalanced data
Statistical Analysis and Data Mining - Best of SDM'09
A New Performance Measure for Class Imbalance Learning. Application to Bioinformatics Problems
ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
Boosting prediction accuracy on imbalanced datasets with SVM ensembles
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Application of majority voting to pattern recognition: an analysis of its behavior and performance
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A generic classifier-ensemble approach for biomedical named entity recognition
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems
Applied Soft Computing
Hi-index | 0.00 |
Data in many biological problems are often compounded by imbalanced class distribution. That is, the positive examples may largely outnumbered by the negative examples. Many classification algorithms such as support vector machine (SVM) are sensitive to data with imbalanced class distribution, and result in a suboptimal classification. It is desirable to compensate the imbalance effect in model training for more accurate classification. In this study, we propose a sample subset optimization technique for classifying biological data with moderate and extremely high imbalanced class distributions. By using this optimization technique with an ensemble of SVMs, we build multiple roughly balanced SVM base classifiers, each trained on an optimized sample subset. The experimental results demonstrate that the ensemble of SVMs created by our sample subset optimization technique can achieve higher area under the ROC curve (AUC) value than popular sampling approaches such as random over-/under-sampling; SMOTE sampling, and those in widely used ensemble approaches such as bagging and boosting.