A universal predictor based on pattern matching

Authors:
P. Jacquet;W. Szpankowski;I. Apostol
Affiliations:
Inst. Nat. de Recherche en Inf. et Autom., Le Chesnay;-;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 8

Limit laws for the height in PATRICIA tries

Journal of Algorithms - Analysis of algorithms
Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series

Theoretical Computer Science
Analyzing a class of pseudo-random bit generator through inductive machine learning paradigm

Intelligent Data Analysis
Universal reinforcement learning

IEEE Transactions on Information Theory
Time series forecasting of web performance data monitored by MWING multiagent distributed system

ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume PartI
On families of new adaptive compression algorithms suitable for time-varying source data

ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
A fast and efficient nearly-optimal adaptive Fano coding scheme

Information Sciences: an International Journal
Collective suffix tree-based models for location prediction

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

Quantified Score

Hi-index	754.90

Visualization

Abstract

We consider a universal predictor based on pattern matching. Given a sequence X1, ..., Xn drawn from a stationary mixing source, it predicts the next symbol Xn+1 based on selecting a context of Xn+1. The predictor, called the sampled pattern matching (SPM), is a modification of the Ehrenfeucht-Mycielski (1992) pseudorandom generator algorithm. It predicts the value of the most frequent symbol appearing at the so-called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside X1X2···Xn ; that is, in SPM, the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by Wyner and Ziv in their 1989 seminal paper. Here, we estimate the redundancy of the SPM universal predictor, that is, we prove that the probability the SPM predictor makes worse decisions than the optimal predictor is O(n-ν) for some 0<ν<½ as n→∞. As a matter of fact, we show that we can predict K=O(1) symbols with the same probability of error