Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme

Authors:
Eiji Takimotoa;Akira Maruokaa;Volodya Vovk
Affiliations:
Tohoku Univ., Sendai, Japan;Tohoku Univ., Sendai, Japan;University of London, Egham, UK
Venue:
Theoretical Computer Science
Year:
2001

Citing 18
Cited 2

Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
C4.5: programs for machine learning

C4.5: programs for machine learning
Universal forecasting algorithms

Information and Computation
The weighted majority algorithm

Information and Computation
Weakly learning DNF and characterizing statistical query learning using Fourier analysis

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Exact learning Boolean functions via the monotone theory

Information and Computation
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Predicting Nearly As Well As the Best Pruning of a Decision Tree

Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
How to use expert advice

Journal of the ACM (JACM)
Using and combining predictors that specialize

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Derandomizing stochastic prediction strategies

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
An efficient extension to mixture techniques for prediction and decision trees

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
A game of prediction with expert advice

Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
A Simple Algorithm for Predicting Nearly as Well as the Best Pruning Labeled with the Best Prediction Values of a Decision Tree

ALT '97 Proceedings of the 8th International Conference on Algorithmic Learning Theory
TIGHT WORST-CASE LOSS BOUNDS FOR PREDICTING WITH EXPERT ADVICE

TIGHT WORST-CASE LOSS BOUNDS FOR PREDICTING WITH EXPERT ADVICE
Universal portfolios with side information

IEEE Transactions on Information Theory

On-Line Algorithm to Predict Nearly as Well as the Best Pruning of a Decision Tree

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Path kernels and multiplicative updates

The Journal of Machine Learning Research

Quantified Score

Hi-index	5.23

Visualization

Abstract

Helmbold and Schapire gave an on-line prediction algorithm that, when given an unpruned decision tree, produces predictions not much worse than the predictions made by the best pruning of the given decision tree. In this paper, we give two new on-line algorithms. The first algorithm is based on the observation that finding the best pruning can be efficiently solved by a dynamic programming in the "batch" setting where all the data to be predicted are given in advance. This algorithm works well for a wide class of loss functions, whereas the one given by Helmbold and Schapire is only described for the absolute loss function. Moreover, the algorithm given in this paper is so simple and general that it could be applied to many other on-line optimization problems solved by dynamic programming. We also explore the second algorithm that is competitive not only with the best pruning but also with the best prediction values which are associated with nodes in the decision tree. In this setting, a greatly simplified algorithm is given for the absolute loss function. It can be easily generalized to the case where, instead of using decision trees, data are classified in some arbitrarily fixed manner.