An efficient universal prediction algorithm for unknown sources with limited training data

Authors:
J. Ziv
Affiliations:
Dept. of Electr. Eng., Technion-Israel Inst. of Technol., Haifa
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 1

Analyzing a class of pseudo-random bit generator through inductive machine learning paradigm

Intelligent Data Analysis

Quantified Score

Hi-index	754.84

Visualization

Abstract

Inspired by C. E. Shannon's celebrated paper: "Prediction and entropy of printed English" (1951), we consider the optimal prediction error for unknown finite-alphabet ergodic Markov sources, for prediction algorithms that make inference about the most probable incoming letter, where the distribution of the unknown source is apparent only via a short training sequence of N + 1 letters. We allow N to be a polynomial function of K, the order of the Markov source, rather than the classical case where N is allowed to be exponential in K. A lower bound on the prediction error is formulated for such universal prediction algorithms, that are based on suffixes that were observed somewhere in the past "training sequence" X-N-1 (i.e. it is assumed that the universal predictor, given the past (N + 1)-sequence which serves as a training sequence is no better than the optimal predictor given only the longest suffix that appeared somewhere in the past X-N -1 vector). For a class of stationary Markov sources (which includes all Markov sources with positive transition probabilities), a particular universal predictor is introduced, and it is demonstrated that its performance is "optimal" in the sense that it yields a prediction error which is close to the lower bound on the universal prediction error, with limited training data. The results are nonasymptotic in the sense that they express the effect of limited training data on the efficiency of universal predictors. An asymptotically optimal universal predictor which is based on pattern matching appears elsewhere in the literature. However, the prediction error of these algorithms does not necessarily come close to the lower bound in the nonasymptotic region