Efficient algorithms for decision tree cross-validation

Authors:
Hendrik Blockeel;Jan Struyf
Affiliations:
Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001 Leuven, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001 Leuven, Belgium
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 14
Cited 7

C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Decision Tree Induction Based on Efficient Tree Restructuring

Machine Learning
Top-down induction of first-order logical decision trees

Artificial Intelligence
The CN2 Induction Algorithm

Machine Learning
Induction of Decision Trees

Machine Learning
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Top-Down Induction of Clustering Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Inductive Constraint Logic

ALT '95 Proceedings of the 6th International Conference on Algorithmic Learning Theory
Improving the efficiency of inductive logic programming through the use of query packs

Journal of Artificial Intelligence Research

Evaluation of distance measures for multi-class classification in binary SVM decision tree

ICAISC'10 Proceedings of the 10th international conference on Artificial intelligence and soft computing: Part I
Exploiting code redundancies in ECOC

DS'10 Proceedings of the 13th international conference on Discovery science
Predicting structured outputs k-nearest neighbours method

DS'11 Proceedings of the 14th international conference on Discovery science
Bagging using statistical queries

ECML'06 Proceedings of the 17th European conference on Machine Learning
Learning predictive clustering rules

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Tree ensembles for predicting structured outputs

Pattern Recognition
Multi-target regression with rule ensembles

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. We identify a number of parameters that influence the obtainable speedups, and validate and refine our analysis with experiments on a variety of data sets with two different implementations. Besides cross-validation, we also briefly explore the usefulness of these techniques for bagging. We conclude with some guidelines concerning when these optimizations should be considered.