A Mixture-of-Experts Framework for Learning from Imbalanced Data Sets

Authors:
Andrew Estabrooks;Nathalie Japkowicz
Affiliations:
-;-
Venue:
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Year:
2001

Citing 4
Cited 6

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Classification of seismic signals by integrating ensembles ofneural networks

IEEE Transactions on Signal Processing

Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The class imbalance problem: A systematic study

Intelligent Data Analysis
Handling imbalanced data sets with a modification of Decorate algorithm

International Journal of Computer Applications in Technology
Hybrid sampling for imbalanced data

Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
A new over-sampling approach: Random-SMOTE for learning from imbalanced data sets

KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Re-Sampling methods are some of the different types of approaches proposed to deal with the class-imbalance problem. Although such approaches are very simple, tuning them most effectively is not an easy task. In particular, it is unclear whether oversampling is more effective than undersampling and which oversampling or undersampling rate should be used. This paper presents an experimental study of these questions and concludes that combining different expressions of the resampling approach in a mixture of experts framework is an effective solution to the tuning problem. The proposed combination scheme is evaluated on a subset of the REUTERS-21578 text collection (the 10 top categories) and is shown to be very effective when the data is drastically imbalanced.