A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
IEEE Transactions on Pattern Analysis and Machine Intelligence
Classification of seismic signals by integrating ensembles ofneural networks
IEEE Transactions on Signal Processing
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The class imbalance problem: A systematic study
Intelligent Data Analysis
Handling imbalanced data sets with a modification of Decorate algorithm
International Journal of Computer Applications in Technology
Hybrid sampling for imbalanced data
Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
A new over-sampling approach: Random-SMOTE for learning from imbalanced data sets
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
Hi-index | 0.00 |
Re-Sampling methods are some of the different types of approaches proposed to deal with the class-imbalance problem. Although such approaches are very simple, tuning them most effectively is not an easy task. In particular, it is unclear whether oversampling is more effective than undersampling and which oversampling or undersampling rate should be used. This paper presents an experimental study of these questions and concludes that combining different expressions of the resampling approach in a mixture of experts framework is an effective solution to the tuning problem. The proposed combination scheme is evaluated on a subset of the REUTERS-21578 text collection (the 10 top categories) and is shown to be very effective when the data is drastically imbalanced.