Individual sequence prediction using memory-efficient context trees

Authors:
Ofer Dekel;Shai Shalev-Shwartz;Yoram Singer
Affiliations:
Microsoft Research, Redmond,WA;Department of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel;Google Research, Mountain View, CA
Venue:
IEEE Transactions on Information Theory
Year:
2009

Citing 8
Cited 1

The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Predicting Nearly As Well As the Best Pruning of a Decision Tree

Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
An Efficient Extension to Mixture Techniques for Prediction and Decision Trees

Machine Learning
The Robustness of the p-Norm Algorithms

Machine Learning
Prediction, Learning, and Games

Prediction, Learning, and Games
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Tracking the best hyperplane with a simple budget perceptron

COLT'06 Proceedings of the 19th annual conference on Learning Theory
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

Incremental prediction for sequential data

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II

Quantified Score

Hi-index	754.84

Visualization

Abstract

Context trees are a popular and effective tool for tasks such as compression, sequential prediction, and language modeling. We present an algebraic perspective of context trees for the task of individual sequence prediction. Our approach stems from a generalization of the notion of margin used for linear predictors. By exporting the concept of margin to context trees, we are able to cast the individual sequence prediction problem as the task of finding a linear separator in a Hilbert space, and to apply techniques from machine learning and online optimization to this problem. Our main contribution is a memory efficient adaptation of the perceptron algorithm for individual sequence prediction. We name our algorithm the shallow perceptron and prove a shifting mistake bound, which relates its performance with the performance of any sequence of context trees. We also prove that the shallow perceptron grows a context tree at a rate that is upper bounded by its mistake rate, which imposes an upper bound on the size of the trees grown by our algorithm.