Complexity Measures of Supervised Classification Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Handbook of Parametric and Nonparametric Statistical Procedures
Handbook of Parametric and Nonparametric Statistical Procedures
Mining data with random forests: A survey and results of new tests
Pattern Recognition
Application of machine learning methods to spatial interpolation of environmental variables
Environmental Modelling & Software
Hi-index | 0.00 |
Random forest is a collection (ensemble) of decision trees. It is a popular ensemble technique in pattern recognition. In this article, we apply random forest for cancer classification based on gene expression and address two issues that have been so far overlooked in other works. First, we demonstrate on two different real-world datasets that the performance of random forest is strongly influenced by dataset complexity. When estimated before running random forest, this complexity can serve as a useful performance indicator and it can explain a difference in performance on different datasets. Second, we show that one should rely with caution on feature importance used to rank genes: two forests, generated with the different number of features per node split, may have very similar classification errors on the same dataset, but the respective lists of genes ranked according to feature importance can be weakly correlated.