C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Machine Learning
Machine Learning
Machine Learning
Pruning Decision Trees with Misclassification Costs
ECML '98 Proceedings of the 10th European Conference on Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Tree Induction for Probability-Based Ranking
Machine Learning
Inference for the Generalization Error
Machine Learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
An Improved Attribute Selection Measure for Decision Tree Induction
FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
Learning decision tree for ranking
Knowledge and Information Systems
AUC: a statistically consistent and more discriminating measure than accuracy
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Scaling Up the Accuracy of Bayesian Network Classifiers by M-Estimate
ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
Editorial: Modifications of the construction and voting mechanisms of the Random Forests Algorithm
Data & Knowledge Engineering
Hi-index | 0.00 |
The random forests (RF) algorithm, which combines the predictions from an ensemble of random trees, has achieved significant improvements in terms of classification accuracy. In many real-world applications, however, ranking is often required in order to make optimal decisions. Thus, we focus our attention on the ranking performance of RF in this paper. Our experimental results based on the entire 36 UC Irvine Machine Learning Repository (UCI) data sets published on the main website of Weka platform show that RF doesn't perform well in ranking, and is even about the same as a single C4.4 tree. This fact raises the question of whether several improvements to RF can scale up its ranking performance. To answer this question, we single out an improved random forests (IRF) algorithm. Instead of the information gain measure and the maximum-likelihood estimate, the average gain measure and the similarity-weighted estimate are used in IRF. Our experiments show that IRF significantly outperforms all the other algorithms used to compare in terms of ranking while maintains the high classification accuracy characterizing RF.