Towards a better understanding of random forests through the study of strength and correlation

  • Authors:
  • Simon Bernard;Laurent Heutte;Sébastien Adam

  • Affiliations:
  • Université de Rouen, LITIS, Saint-Etienne du Rouvray, France;Université de Rouen, LITIS, Saint-Etienne du Rouvray, France;Université de Rouen, LITIS, Saint-Etienne du Rouvray, France

  • Venue:
  • ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a study on the Random Forest (RF) family of ensemble methods. From our point of view, a "classical" RF induction process presents two main drawbacks : (i) the number of trees has to be a priori fixed (ii) trees are independently, thus arbitrarily, added to the ensemble due to the randomization. Hence, this kind of process offers no guarantee that all the trees will well cooperate into the same committee. In this work we thus propose to study the RF mechanisms that explain this cooperation by analysing, for particular subsets of trees called sub-forests, the link between accuracy and properties such as Strength and Correlation. We show that these properties, through the Correlation/Strengh2 ratio, should be taken into account to explain the sub-forest performance.