Measuring the VC-dimension of a learning machine
Neural Computation
Machine Learning
Machine Learning
Machine Learning
Limiting the Number of Trees in Random Forests
MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Induction of comprehensible models for gene expression datasets by subgroup discovery methodology
Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
FIMH '09 Proceedings of the 5th International Conference on Functional Imaging and Modeling of the Heart
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Discriminative, Semantic Segmentation of Brain Tissue in MR Images
MICCAI '09 Proceedings of the 12th International Conference on Medical Image Computing and Computer-Assisted Intervention: Part II
A comparative assessment of ensemble learning for credit scoring
Expert Systems with Applications: An International Journal
Spatial decision forests for MS lesion segmentation in multi-channel MR images
MICCAI'10 Proceedings of the 13th international conference on Medical image computing and computer-assisted intervention: Part I
Detecting and classifying linear structures in mammograms using random forests
IPMI'11 Proceedings of the 22nd international conference on Information processing in medical imaging
MLMI'11 Proceedings of the Second international conference on Machine learning in medical imaging
Keyword Annotation of Medical Image with Random Forest Classifier and Confidence Assigning
CGIV '11 Proceedings of the 2011 Eighth International Conference Computer Graphics, Imaging and Visualization
Root attribute behavior within a random forest
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Hi-index | 0.00 |
Random Forest is a computationally efficient technique that can operate quickly over large datasets. It has been used in many recent research projects and real-world applications in diverse domains. However, the associated literature provides almost no directions about how many trees should be used to compose a Random Forest. The research reported here analyzes whether there is an optimal number of trees within a Random Forest, i.e., a threshold from which increasing the number of trees would bring no significant performance gain, and would only increase the computational cost. Our main conclusions are: as the number of trees grows, it does not always mean the performance of the forest is significantly better than previous forests (fewer trees), and doubling the number of trees is worthless. It is also possible to state there is a threshold beyond which there is no significant gain, unless a huge computational environment is available. In addition, it was found an experimental relationship for the AUC gain when doubling the number of trees in any forest. Furthermore, as the number of trees grows, the full set of attributes tend to be used within a Random Forest, which may not be interesting in the biomedical domain. Additionally, datasets' density-based metrics proposed here probably capture some aspects of the VC dimension on decision trees and low-density datasets may require large capacity machines whilst the opposite also seems to be true.