Margin optimization based pruning for random forest

Authors:
Fan Yang;Wei-Hang Lu;Lin-Kai Luo;Tao Li
Affiliations:
School of Information Science and Technology, Xiamen University, Xiamen 361005, China;School of Information Science and Technology, Xiamen University, Xiamen 361005, China;School of Information Science and Technology, Xiamen University, Xiamen 361005, China;School of Computer Science, Florida International University, Miami, FL 33199, USA
Venue:
Neurocomputing
Year:
2012

Citing 18
Cited 0

Prediction games and arcing algorithms

Neural Computation
Random Forests

Machine Learning
Ensembling neural networks: many could be better than all

Artificial Intelligence
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
How boosting the margin can also boost classifier complexity

ICML '06 Proceedings of the 23rd international conference on Machine learning
Totally corrective boosting algorithms that maximize the margin

ICML '06 Proceedings of the 23rd international conference on Machine learning
An analysis of diversity measures

Machine Learning
Efficient Margin Maximizing with Boosting

The Journal of Machine Learning Research
Ensemble Pruning Via Semi-definite Programming

The Journal of Machine Learning Research
An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Selective Ensemble under Regularization Framework

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Statistical Instance-Based Ensemble Pruning for Multi-class Problems

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Applications of Supervised and Unsupervised Ensemble Methods

Applications of Supervised and Unsupervised Ensemble Methods
Boosting through optimization of margin distributions

IEEE Transactions on Neural Networks
Ensemble pruning via individual contribution ordering

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Sparse ensembles using weighted combination methods based on linear programming

Pattern Recognition
An algorithm for pruning redundant modules in min-max modular network with GZC function

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I
A double pruning algorithm for classification ensembles

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

This article introduces a margin optimization based pruning algorithm which is able to reduce the ensemble size and improve the performance of a random forest. A key element of the proposed algorithm is that it directly takes into account the margin distribution of the random forest model on the training set. Four different metrics based on the margin distribution are used to evaluate the generalization ability of subensembles and the importance of individual classification trees in an ensemble. After a forest is built, the trees in the ensemble are first ranked according to the margin metrics and subensembles with decreasing sizes are then built by recursively removing the least important trees one by one. Experiments on 10 benchmark datasets demonstrate that our proposed algorithm can significantly improve the generalization performance while reducing the ensemble size at the same time. Furthermore, empirical comparison with other pruning methods indicates that the margin distribution plays an important role in evaluating the performance of a random forest, and can be directly used to select the near-optimal subensembles.