A fast splitting procedure for classification trees

  • Authors:
  • Francesco Mola;Roberta Siciliano

  • Affiliations:
  • Dipartimento di Matematica e Statistica, Universita` degli Studi di Napoli Federico II, Monte S. Angelo, via Cintia, 80126 Napoli, Italy f.mola@dmsna.dms.unina.it r.sic@dmsna.dms.unina.it;Dipartimento di Matematica e Statistica, Universita` degli Studi di Napoli Federico II, Monte S. Angelo, via Cintia, 80126 Napoli, Italy f.mola@dmsna.dms.unina.it r.sic@dmsna.dms.unina.it

  • Venue:
  • Statistics and Computing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper provides a faster method to find the best split at each node when using the CART methodology. The predictability index τ is proposed as a splitting rule for growing the same classification tree as CART does when using the Gini index of heterogeneity as an impurity measure. A theorem is introduced to show a new property of the index τ: the τ for a given predictor has a value not lower than the τ for any split generated by the predictor. This property is used to make a substantial saving in the time required to generate a classification tree. Three simulation studies are presented in order to show the computational gain in terms of both the number of splits analysed at each node and the CPU time. The proposed splitting algorithm can prove computational efficiency in real data sets as shown in an example.