Root attribute behavior within a random forest

Authors:
Thais Mayumi Oshiro;José Augusto Baranauskas
Affiliations:
Department of Computer Science and Mathematics, Faculty of Philosophy, Sciences and Languages at Ribeirao Preto, University of Sao Paulo, Brazil;Department of Computer Science and Mathematics, Faculty of Philosophy, Sciences and Languages at Ribeirao Preto, University of Sao Paulo, Brazil
Venue:
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Year:
2012

Citing 10
Cited 0

Bagging predictors

Machine Learning
Random Forests

Machine Learning
Hybrid Genetic Algorithms for Feature Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
A review of feature selection techniques in bioinformatics

Bioinformatics
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
A comparative assessment of ensemble learning for credit scoring

Expert Systems with Applications: An International Journal
Improving the classification accuracy of the classic RF method by intelligent feature selection and weighted voting of trees with application to medical image segmentation

MLMI'11 Proceedings of the Second international conference on Machine learning in medical imaging
Keyword Annotation of Medical Image with Random Forest Classifier and Confidence Assigning

CGIV '11 Proceedings of the 2011 Eighth International Conference Computer Graphics, Imaging and Visualization
How many trees in a random forest?

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Random Forest is a computationally efficient technique that can operate quickly over large datasets. It has been used in many recent research projects and real-world applications in diverse domains. However, the associated literature provides few information about what happens in the trees within a Random Forest. The research reported here analyzes the frequency that an attribute appears in the root node in a Random Forest in order to find out if it uses all attributes with equal frequency or if there is some of them most used. Additionally, we have also analyzed the estimated out-of-bag error of the trees aiming to check if the most used attributes present a good performance. Furthermore, we have analyzed if the use of pre-pruning could influence the performance of the Random Forest using out-of-bag errors. Our main conclusions are that the frequency of the attributes in the root node has an exponential behavior. In addition, the use of the estimated out-of-bag error can help to find relevant attributes within the forest. Concerning to the use of pre-pruning, it was observed the execution time can be faster, without significant loss of performance.