Compact ensemble trees for imbalanced data

Authors:
Yubin Park;Joydeep Ghosh
Affiliations:
Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX;Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX
Venue:
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Year:
2011

Citing 10
Cited 0

Technical note: some properties of splitting criteria

Machine Learning
Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems

Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems
Improving Identification of Difficult Small Classes by Balancing Class Distribution

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Does cost-sensitive learning beat sampling for classifying rare classes?

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
A Comparison of Decision Tree Ensemble Creation Techniques

IEEE Transactions on Pattern Analysis and Machine Intelligence
The class imbalance problem: A systematic study

Intelligent Data Analysis
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Many are better than one: improving probabilistic estimates from decision trees

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a novel splitting criterion parametrized by a scalar 'α' to build a class-imbalance resistant ensemble of decision trees. The proposed splitting criterion generalizes information gain in C4.5, and its extended form encompasses Gini(CART) and DKM splitting criteria as well. Each decision tree in the ensemble is based on a different splitting criterion enforced by a distinct a. The resultant ensemble, when compared with other ensemble methods, exhibits improved performance over a variety of imbalanced datasets even with small numbers of trees.