Learning from imbalanced data in presence of noisy and borderline examples

Authors:
Krystyna Napierała;Jerzy Stefanowski;Szymon Wilk
Affiliations:
Institute of Computing Science, Poznań University of Technology, Poznań, Poland;Institute of Computing Science, Poznań University of Technology, Poznań, Poland;Institute of Computing Science, Poznań University of Technology, Poznań, Poland
Venue:
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Year:
2010

Citing 4
Cited 7

Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Selective Pre-processing of Imbalanced Data for Improving Classification Performance

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
An empirical study of the behavior of classifiers on imbalanced and overlapped data sets

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications

Addressing the classification with imbalanced data: open problems and new challenges on class distribution

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Identification of different types of minority class examples in imbalanced data

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
BRACID: a comprehensive approach to learning rules from imbalanced data

Journal of Intelligent Information Systems
DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Data & Knowledge Engineering
A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets

Knowledge-Based Systems
Addressing imbalanced classification with instance generation techniques: IPADE-ID

Neurocomputing
IIvotes ensemble for imbalanced data

Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we studied re-sampling methods for learning classifiers from imbalanced data. We carried out a series of experiments on artificial data sets to explore the impact of noisy and borderline examples from the minority class on the classifier performance. Results showed that if data was sufficiently disturbed by these factors, then the focused re-sampling methods - NCR and our SPIDER2 - strongly outperformed the oversampling methods. They were also better for real-life data, where PCA visualizations suggested possible existence of noisy examples and large overlapping ares between classes.