Shape quantization and recognition with randomized trees
Neural Computation
Solving regression problems with rule-based ensemble classifiers
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Is random model better? On its accuracy and efficiency
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Learning through Changes: An Empirical Study of Dynamic Behaviors of Probability Estimation Trees
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Novel ensemble methods for regression via classification problems
Expert Systems with Applications: An International Journal
Semi-random model tree ensembles: an effective and scalable regression method
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
Predicting the values of continuous variable as a function of several independent variables is one of the most important problems for data mining. A very large number of regression methods, both parametric and nonparametric, have been proposed in the past. However, since the list is quite extensive and many of these models make rather explicit, strong yet different assumptions about the type of applicable problems and involve a lot of parameters and options, choosing the appropriate regression methodology and then specifying the parameter values is a none-trivial, sometimes frustrating, task for data mining practitioners. Choosing the inappropriate methodology can have rather disappointing results. This issue is against the general utility of data mining software. For example,linear regression methods are straightforward and well-understood. However, since the linear assumption is very strong, its performance is compromised for complicated non-linear problems. Kernel-based methods perform quite well if the kernel functions are selected correctly. In this paper, we propose a straightforward approach based on summarizing the training data using an ensemble of random decisions trees. It requires very little knowledge from the user, yet is applicable to every type of regression problem that we are currently aware of. We have experimented on a wide range of problems including those that parametric methods performwell, a large selection of benchmark datasets for nonparametric regression, as well as highly non-linear stochastic problems. Our results are either significantly better than or identical to many approaches that are known to perform well on these problems.