Combining integrated sampling with SVM ensembles for learning from imbalanced datasets

Authors:
Yang Liu;Xiaohui Yu;Jimmy Xiangji Huang;Aijun An
Affiliations:
School of Computer Science and Technology, Shandong University, Jinan, China;School of Computer Science and Technology, Shandong University, Jinan, China and School of Information Technology, York University, Toronto, Canada;School of Information Technology, York University, Toronto, Canada;Department of Computer Science and Engineering, York University, Toronto, Canada
Venue:
Information Processing and Management: an International Journal
Year:
2011

Citing 19
Cited 4

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Data mining: concepts and techniques

Data mining: concepts and techniques
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Ensembling neural networks: many could be better than all

Artificial Intelligence
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Aligning Boundary in Kernel Space for Learning Imbalanced Dataset

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
The class imbalance problem: A systematic study

Intelligent Data Analysis
Learning on the border: active learning in imbalanced data classification

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Author identification: Using text sampling to handle the class imbalance problem

Information Processing and Management: an International Journal
Classification of weld flaws with imbalanced class data

Expert Systems with Applications: An International Journal
Mining the customer credit using hybrid support vector machine technique

Expert Systems with Applications: An International Journal
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
On strategies for imbalanced text classification using SVM: A comparative study

Decision Support Systems
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

Multiple extreme learning machines for a two-class imbalance corporate life cycle prediction

Knowledge-Based Systems
A hybrid PSO-FSVM model and its application to imbalanced classification of mammograms

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Adjusted F-measure and kernel scaling for imbalanced data learning

Information Sciences: an International Journal
Machine learning-based classifiers ensemble for credit risk assessment

International Journal of Electronic Finance

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning from imbalanced datasets is difficult. The insufficient information that is associated with the minority class impedes making a clear understanding of the inherent structure of the dataset. Most existing classification methods tend not to perform well on minority class examples when the dataset is extremely imbalanced, because they aim to optimize the overall accuracy without considering the relative distribution of each class. In this paper, we study the performance of SVMs, which have gained great success in many real applications, in the imbalanced data context. Through empirical analysis, we show that SVMs may suffer from biased decision boundaries, and that their prediction performance drops dramatically when the data is highly skewed. We propose to combine an integrated sampling technique, which incorporates both over-sampling and under-sampling, with an ensemble of SVMs to improve the prediction performance. Extensive experiments show that our method outperforms individual SVMs as well as several other state-of-the-art classifiers.