Optimisation and evaluation of random forests for imbalanced datasets

Authors:
Julien Thomas;Pierre-Emmanuel Jouve;Nicolas Nicoloyannis
Affiliations:
Laboratoire ERIC, Université Lumière Lyon2, France;Company Fenics Lyon, France;Laboratoire ERIC, Université Lumière Lyon2, France
Venue:
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Year:
2006

Citing 4
Cited 2

MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Random Forests

Machine Learning
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

Improving the classification accuracy of the classic RF method by intelligent feature selection and weighted voting of trees with application to medical image segmentation

MLMI'11 Proceedings of the Second international conference on Machine learning in medical imaging
Training and assessing classification rules with imbalanced data

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper deals with an optimization of Random Forests which aims at: adapting the concept of forest for learning imbalanced data as well as taking into account user's wishes as far as recall and precision rates are concerned. We propose to adapt Random Forest on two levels. First of all, during the forest creation thanks to the use of asymmetric entropy measure associated to specific leaf class assignation rules. Then, during the voting step, by using an alternative strategy to the classical majority voting strategy. The automation of this second step requires a specific methodology for results quality assessment. This methodology allows the user to define his wishes concerning (1) recall and precision rates for each class of the concept to learn, and, (2) the importance he wants to confer to each one of those classes. Finally, results of experimental evaluations are presented.