Predicting Nearly As Well As the Best Pruning of a Decision Tree

Authors:
David P. Helmbold;Robert E. Schapire
Affiliations:
Computer and Information Sciences University of California Santa Cruz, CA 95064/ E-mail: dph@cse.ucsc.edu;AT&/T Labs 600 Mountain Avenue, Room 2A-424 Murray Hill, NJ 07974/ E-mail: schapire@research.att.com
Venue:
Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
Year:
1997

Citing 12
Cited 28

Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
C4.5: programs for machine learning

C4.5: programs for machine learning
How to use expert advice

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Learning probabilistic automata with variable memory length

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
The weighted majority algorithm

Information and Computation
Averaging over decision stumps

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Using experts for predicting continuous outcomes

Euro-COLT '93 Proceedings of the first European conference on Computational learning theory
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Multiple decision trees

UAI '88 Proceedings of the Fourth Annual Conference on Uncertainty in Artificial Intelligence
A universal finite memory source

IEEE Transactions on Information Theory
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

Tracking the Best Disjunction

Machine Learning - Special issue on context sensitivity and concept drift
PAC-Bayesian model averaging

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Derandomizing Stochastic Prediction Strategies

Machine Learning - Special issue: computational learning theory, COLT '97
An Efficient Extension to Mixture Techniques for Prediction and Decision Trees

Machine Learning
Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme

Theoretical Computer Science
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
PAC-Bayesian Stochastic Model Selection

Machine Learning
Predicting nearly as well as the best pruning of a planar decision graph

Theoretical Computer Science
Structured Weight-Based Prediction Algorithms

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Predicting Nearly as well as the best Pruning of a Planar Decision Graph

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
On-Line Algorithm to Predict Nearly as Well as the Best Pruning of a Decision Tree

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Efficiently Approximating Weighted Sums with Exponentially Many Terms

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Path Kernels and Multiplicative Updates

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Trees, Windows, and Tiles for Wavelet Image Compression

DCC '00 Proceedings of the Conference on Data Compression
The Robustness of the p-Norm Algorithms

Machine Learning
Path kernels and multiplicative updates

The Journal of Machine Learning Research
On approximating weighted sums with exponentially many terms

Journal of Computer and System Sciences
Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees

The Journal of Machine Learning Research
Efficient algorithms for online decision problems

Journal of Computer and System Sciences - Special issue: Learning theory 2003
Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

The Journal of Machine Learning Research
Learning prediction suffix trees with Winnow

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
An analysis of reduced error pruning

Journal of Artificial Intelligence Research
Motif discovery in physiological datasets: A methodology for inferring predictive elements

ACM Transactions on Knowledge Discovery from Data (TKDD)
Individual sequence prediction using memory-efficient context trees

IEEE Transactions on Information Theory
Learning Permutations with Exponential Weights

The Journal of Machine Learning Research
Learning permutations with exponential weights

COLT'07 Proceedings of the 20th annual conference on Learning theory
The shortest path problem under partial monitoring

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Tracking the best of many experts

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	0.07

Visualization

Abstract

Many algorithms for inferring a decision tree from data involve atwo-phase process: First, a very large decision tree is grown whichtypically ends up “over-fitting” the data. To reduceover-fitting, in the second phase, the tree is pruned using one of anumber of available methods. The final tree is then output and usedfor classification on test data.In this paper, we suggest analternative approach to the pruning phase. Using a given unpruneddecision tree, we present a new method of making predictions on testdata, and we prove that our algorithm‘s performance will not be“much worse” (in a precise technical sense) than thepredictions made by the best reasonably small pruning of the givendecision tree. Thus, our procedure is guaranteed to be competitive(in terms of the quality of its predictions) with {\it any} pruning algorithm. We prove that our procedure is veryefficient and highly robust.Our method can be viewed as asynthesis of two previously studied techniques. First, we applyCesa-Bianchi et al.‘s (1993) results on predicting using“expert advice” (where we view each pruning as an“expert”) to obtain an algorithm that has provably lowprediction loss, but that is computationally infeasible. Next, wegeneralize and apply a method developed by Buntine (1990, 1992) andWillems, Shtarkov and Tjalkens (1993, 1995) to derive a veryefficient implementation of this procedure.