A fast splitting procedure for classification trees

Authors:
Francesco Mola;Roberta Siciliano
Affiliations:
Dipartimento di Matematica e Statistica, Universita` degli Studi di Napoli Federico II, Monte S. Angelo, via Cintia, 80126 Napoli, Italy f.mola@dmsna.dms.unina.it r.sic@dmsna.dms.unina.it;Dipartimento di Matematica e Statistica, Universita` degli Studi di Napoli Federico II, Monte S. Angelo, via Cintia, 80126 Napoli, Italy f.mola@dmsna.dms.unina.it r.sic@dmsna.dms.unina.it
Venue:
Statistics and Computing
Year:
1997

Citing 2
Cited 4

Induction of Decision Trees

Machine Learning
Multiple decision trees

UAI '88 Proceedings of the Fourth Annual Conference on Uncertainty in Artificial Intelligence

Discriminant Analysis and Factorial Multiple Splits in Recursive Partitioning for Data Mining

MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Supervised Classifier Combination through Generalized Additive Multi-model

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Conditional classification trees using instrumental variables

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
A statistical approach to growing a reliable honest tree

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper provides a faster method to find the best split at each node when using the CART methodology. The predictability index τ is proposed as a splitting rule for growing the same classification tree as CART does when using the Gini index of heterogeneity as an impurity measure. A theorem is introduced to show a new property of the index τ: the τ for a given predictor has a value not lower than the τ for any split generated by the predictor. This property is used to make a substantial saving in the time required to generate a classification tree. Three simulation studies are presented in order to show the computational gain in terms of both the number of splits analysed at each node and the CPU time. The proposed splitting algorithm can prove computational efficiency in real data sets as shown in an example.