Machine Learning
Fast discovery of association rules
Advances in knowledge discovery and data mining
Multivariate data analysis and modeling through classification and regression trees
Computational Statistics & Data Analysis
Principles of data mining
A fast splitting procedure for classification trees
Statistics and Computing
Conditional classification trees using instrumental variables
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Hi-index | 0.00 |
The framework of this paper is supervised statistical learning in data mining. In particular, multiple sets of inputs are used to predict an output on the basis of a training set. A typical data mining problem is to deal with large sets of within-groups correlated inputs compared to the number of observed objects. Standard tree-based procedures offer unstable and not interpretable solutions especially in case of complex relationships. For that multiple splits defined upon a suitable combination of inputs are required. This paper provides a methodology to build up a tree-based model which nodes splitting is due to factorial multiple splitting variables. A recursive partitioning algorithm is introduced considering a two-stage splitting criterion based on linear discriminant functions. As a result, an automated and fast procedure allows to look for factorial multiple splits able to capture suitable directions in the variability among the sets of inputs. Real world applications are discussed and the results of a simulation study are shown to describe fruitful properties of the proposed methodology.