Inverse random under sampling for class imbalance problem and its application to multi-label classification

Authors:
Muhammad Atif Tahir;Josef Kittler;Fei Yan
Affiliations:
Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK and School of Computing, Engineering and Information Science, Northumbria University, Newcastle Upon Ty ...;Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK;Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK
Venue:
Pattern Recognition
Year:
2012

Citing 28
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
MultiBoosting: A Technique for Combining Boosting and Wagging

Machine Learning
The Role of Combining Rules in Bagging and Boosting

Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Improving Identification of Difficult Small Classes by Balancing Class Distribution

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
ML-KNN: A lazy learning approach to multi-label learning

Pattern Recognition
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
The class imbalance problem: A systematic study

Intelligent Data Analysis
Do unbalanced data have a negative effect on LDA?

Pattern Recognition
Multilabel classification via calibrated label ranking

Machine Learning
Random k-Labelsets: An Ensemble Method for Multilabel Classification

ECML '07 Proceedings of the 18th European conference on Machine Learning
On the Class Imbalance Problem

ICNC '08 Proceedings of the 2008 Fourth International Conference on Natural Computation - Volume 04
Bayes Vector Quantizer for Class-Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
Combining instance-based learning and logistic regression for multilabel classification

Machine Learning
Classifier Chains for Multi-label Classification

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Roughly balanced bagging for imbalanced data

Statistical Analysis and Data Mining - Best of SDM'09
An asymmetric classifier based on partial least squares

Pattern Recognition
An empirical analysis of under-sampling techniques to balance a protein structural class dataset

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Random k-Labelsets for Multilabel Classification

IEEE Transactions on Knowledge and Data Engineering
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory

Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

Knowledge-Based Systems
Integrated Fisher linear discriminants: An empirical study

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, a novel inverse random under sampling (IRUS) method is proposed for the class imbalance problem. The main idea is to severely under sample the majority class thus creating a large number of distinct training sets. For each training set we then find a decision boundary which separates the minority class from the majority class. By combining the multiple designs through fusion, we construct a composite boundary between the majority class and the minority class. The proposed methodology is applied on 22 UCI data sets and experimental results indicate a significant increase in performance when compared with many existing class-imbalance learning methods. We also present promising results for multi-label classification, a challenging research problem in many modern applications such as music, text and image categorization.