A Comparison of Decision Tree Ensemble Creation Techniques

Authors:
Robert E. Banfield;Lawrence O. Hall;Kevin W. Bowyer;W. P. Kegelmeyer
Affiliations:
IEEE;IEEE;IEEE;IEEE
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2007

Citing 16
Cited 46

Applied multivariate statistical analysis

Applied multivariate statistical analysis
The Strength of Weak Learnability

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Random Forests

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
How to Make AdaBoost.M1 Work for Weak Base Classifiers by Changing Only One Line of the Code

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Boosting the margin: A new explanation for the effectiveness of voting methods

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Comparing Pure Parallel Ensemble Creation Techniques Against Bagging

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms

Neural Computation
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
A new ensemble diversity measure applied to thinning ensembles

MCS'03 Proceedings of the 4th international conference on Multiple classifier systems

To Select or To Weigh: A Comparative Study of Linear Combination Schemes for SuperParent-One-Dependence Estimators

IEEE Transactions on Knowledge and Data Engineering
A local boosting algorithm for solving classification problems

Computational Statistics & Data Analysis
Classifier ensemble selection using hybrid genetic algorithms

Pattern Recognition Letters
RotBoost: A technique for combining Rotation Forest and AdaBoost

Pattern Recognition Letters
Ensembles of Multi-Objective Decision Trees

ECML '07 Proceedings of the 18th European conference on Machine Learning
Empirical analysis of support vector machine ensemble classifiers

Expert Systems with Applications: An International Journal
A novel method for constructing ensemble classifiers

Statistics and Computing
MTForest: Ensemble Decision Trees based on Multi-Task Learning

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Particle swarm optimization based multi-prototype ensembles

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography

Computational Statistics & Data Analysis
Recruiter selection model and implementation within the united states army

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Indexing ICD-9 codes for free-textual clinical diagnosis records by a new ensemble classifier

International Journal of Computational Intelligence in Bioinformatics and Systems Biology
Ensemble-based classifiers

Artificial Intelligence Review
Comparing two genetic overproduce-and-choose strategies for fuzzy rule-based multiclassification systems generated by bagging and mutual information-based feature selection

International Journal of Hybrid Intelligent Systems - Hybrid Fuzzy Models
Exploration of bagging ensembles comprising genetic fuzzy models to assist with real estate appraisals

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Using ensembles of decision trees to automate repetitive tasks in web applications

Proceedings of the 2nd ACM SIGCHI symposium on Engineering interactive computing systems
A fuzzy random forest

International Journal of Approximate Reasoning
Mining data with random forests: A survey and results of new tests

Pattern Recognition
An empirical study of applying ensembles of heterogeneous classifiers on imperfect data

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Analysis of bagging ensembles of fuzzy models for premises valuation

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
Comparison of bagging, boosting and stacking ensembles applied to real estate appraisal

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
Detecting and ordering salient regions

Data Mining and Knowledge Discovery
Small-sample error estimation for bagged classification rules

EURASIP Journal on Advances in Signal Processing - Special issue on genomic signal processing
Machine learning approaches for high-resolution urban land cover classification: a comparative study

Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications
Anomaly detection using ensembles

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Compact ensemble trees for imbalanced data

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Random feature weights for decision tree ensemble construction

Information Fusion
Bucket Learning: Improving model quality through enhancing local patterns

Knowledge-Based Systems
Hellinger distance decision trees are robust and skew-insensitive

Data Mining and Knowledge Discovery
Classifiers selection in ensembles using genetic algorithms for bankruptcy prediction

Expert Systems with Applications: An International Journal
Generalised bottom-up pruning: A model level combination of decision trees

Expert Systems with Applications: An International Journal
Ensemble pruning using harmony search

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
An efficient ensemble classification method based on novel classifier selection technique

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Dynamic Random Forests

Pattern Recognition Letters
Scalable random forests for massive data

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
An expandable recommendation system on IPTV

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Exploring topic coherence over many models and many topics

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Stratified sampling for feature subspace selection in random forests for high dimensional data

Pattern Recognition
How large should ensembles of classifiers be?

Pattern Recognition
The use of artificial-intelligence-based ensembles for intrusion detection: a review

Applied Computational Intelligence and Soft Computing
Bagging and Boosting statistical machine translation systems

Artificial Intelligence
Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining
Decision trees: a recent overview

Artificial Intelligence Review
One class random forests

Pattern Recognition
Malware detection by pruning of parallel ensembles using harmony search

Pattern Recognition Letters
Multiobjective genetic classifier selection for random oracles fuzzy rule-based classifier ensembles: How beneficial is the additional diversity?

Knowledge-Based Systems

Quantified Score

Hi-index	0.15

Visualization

Abstract

We experimentally evaluate bagging and seven other randomization-based approaches to creating an ensemble of decision tree classifiers. Statistical tests were performed on experimental results from 57 publicly available data sets. When cross-validation comparisons were tested for statistical significance, the best method was statistically more accurate than bagging on only eight of the 57 data sets. Alternatively, examining the average ranks of the algorithms across the group of data sets, we find that boosting, random forests, and randomized trees are statistically significantly better than bagging. Because our results suggest that using an appropriate ensemble size is important, we introduce an algorithm that decides when a sufficient number of classifiers has been created for an ensemble. Our algorithm uses the out-of-bag error estimate, and is shown to result in an accurate ensemble for those methods that incorporate bagging into the construction of the ensemble.