An Efficient Extension to Mixture Techniques for Prediction and Decision Trees

Authors:
Fernando C. Pereira;Yoram Singer
Affiliations:
AT&T Labs, 180 Park Avenue, Florham Park, NJ 07932. pereira@research.att.com;AT&T Labs, 180 Park Avenue, Florham Park, NJ 07932. singer@research.att.com
Venue:
Machine Learning
Year:
1999

Citing 13
Cited 10

Text compression

Text compression
Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
Learning probabilistic prediction functions

COLT '88 Proceedings of the first annual workshop on Computational learning theory
C4.5: programs for machine learning

C4.5: programs for machine learning
The weighted majority algorithm

Information and Computation
The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Predicting Nearly As Well As the Best Pruning of a Decision Tree

Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
How to use expert advice

Journal of the ACM (JACM)
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Adaptive mixtures of probabilistic transducers

Neural Computation
Statistical methods for speech recognition

Statistical methods for speech recognition
A universal finite memory source

IEEE Transactions on Information Theory
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

Efficiently Approximating Weighted Sums with Exponentially Many Terms

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Learning theory and language modeling

Exploring artificial intelligence in the new millennium
Detecting errors within a corpus using anomaly detection

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
On approximating weighted sums with exponentially many terms

Journal of Computer and System Sciences
Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees

The Journal of Machine Learning Research
Learning prediction suffix trees with Winnow

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
An analysis of reduced error pruning

Journal of Artificial Intelligence Research
Individual sequence prediction using memory-efficient context trees

IEEE Transactions on Information Theory
Being Bayesian about network structure

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Tracking the best of many experts

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	0.06

Visualization

Abstract

We present an efficient method for maintaining mixtures of prunings of a prediction or decision tree that extends the previous methods for “node-based” prunings (Buntine, 1990; Willems, Shtarkov, & Tjalkens, 1995; Helmbold & Schapire, 1997; Singer, 1997)to the larger class of edge-based prunings. The method includes an online weight-allocation algorithm that can be used for prediction, compression and classification. Although the set of edge-based prunings of a given tree is much larger than that of node-based prunings, our algorithm has similar space and time complexity to that of previous mixture algorithms for trees. Using the general online framework of Freund and Schapire (1997), we prove that our algorithm maintains correctly the mixture weights for edge-based prunings with any bounded loss function. We also give a similar algorithm for the logarithmic loss function with a corresponding weight-allocation algorithm. Finally, we describe experiments comparing node-based and edge-based mixture models for estimating the probability of the next word in English text, which show the advantages of edge-based models.