IIvotes ensemble for imbalanced data

Authors:
Jerzy Błaszczyński;Magdalena Deckert;Jerzy Stefanowski;Szymon Wilk
Affiliations:
Institute of Computing Science, Poznań University of Technology, Poznań, Poland;Institute of Computing Science, Poznań University of Technology, Poznań, Poland;Institute of Computing Science, Poznań University of Technology, Poznań, Poland;Institute of Computing Science, Poznań University of Technology, Poznań, Poland
Venue:
Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data
Year:
2012

Citing 27
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Pasting Small Votes for Classification in Large Databases and On-Line

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
The CN2 Induction Algorithm

Machine Learning
Using Rule Sets to Maximize ROC Performance

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Towards tight bounds for rule learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
Selective Pre-processing of Imbalanced Data for Improving Classification Performance

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Machine Learning and Data Mining: Introduction to Principles and Algorithms

Machine Learning and Data Mining: Introduction to Principles and Algorithms
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
AESNB: Active Example Selection with Naïve Bayes Classifier for Learning from Imbalanced Biomedical Data

BIBE '09 Proceedings of the 2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering
Ensembles of Abstaining Classifiers Based on Rule Sets

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
An empirical study of the behavior of classifiers on imbalanced and overlapped data sets

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Boosting support vector machines for imbalanced data sets

Knowledge and Information Systems
Integrating selective pre-processing of imbalanced data with Ivotes ensemble

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Learning from imbalanced data in presence of noisy and borderline examples

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
A comparison of three voting methods for bagging with the MLEM2 algorithm

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Classifying severely imbalanced data

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the paper we present IIvotes --a new framework for constructing an ensemble of classifiers from imbalanced data. IIvotes incorporates the SPIDER method for selective data pre-processing into the adaptive Ivotes ensemble. Such an integration is aimed at improving balance between sensitivity and specificity evaluated by the G-mean measure for the minority class in comparison with single classifiers also combined with SPIDER. Using SPIDER to pre-process specific learning samples inside the ensemble improves sensitivity of derived component classifiers. At the same time the controlling mechanism of IIvotes ensures that overall accuracy and thus specificity is kept at a reasonable level. The new proposed IIvotes ensemble was thoroughly evaluated in a series of experiments where we tested it with symbolic decision trees and rules and non-symbolic Naive Bayes component classifiers. The results confirmed that combining SPIDER with an ensemble improved the performance in terms of the G-mean measures in comparison to a single classifier with SPIDER for all tested types of classifiers and two SPIDER pre-processing options weak and strong amplification. These advantages were especially evident for decision trees and rules where differences between single and ensemble classifiers with SPIDER were more significant for both pre-processing options than for Naive Bayes. Moreover, the results demonstrated advantages of using a special abstaining classification strategy inside IIvotes rule ensembles, where component rule-based classifiers may refrain from predicting a class when in doubt. Abstaining rule ensembles performed much better with regard to G-mean than their non-abstaining variants.