An Empirical Study of Learning from Imbalanced Data Using Random Forest

  • Authors:
  • Taghi M. Khoshgoftaar;Moiz Golawala;Jason Van Hulse

  • Affiliations:
  • -;-;-

  • Venue:
  • ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses a comprehensive suite of experi- ments that analyze the performance of the random forest (RF) learner implemented in Weka. RF is a relatively new learner, and to the best of our knowledge, only preliminary experimentation on the construction of random forest clas- sifiers in the context of imbalanced data has been reported in previous work. Therefore, the contribution of this study is to provide an extensive empirical evaluation of RF learn- ers built from imbalanced data. What should be the rec- ommended default number of trees in the ensemble? What should the recommended value be for the number of at- tributes? How does the RF learner perform on imbalanced data when compared with other commonly-used learners? We address these and other related issues in this work.