Generalized Shannon Code Minimizes the Maximal Redundancy
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
The Last-Step Minimax Algorithm
ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
How to Achieve Minimax Expected Kullback-Leibler Distance from an Unknown Finite Distribution
ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
On-Line Estimation of Hidden Markov Model Parameters
DS '00 Proceedings of the Third International Conference on Discovery Science
Predicting a binary sequence almost as well as the optimal biased coin
Information and Computation
A lower bound on compression of unknown alphabets
Theoretical Computer Science
Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
The Journal of Machine Learning Research
A linear-time algorithm for computing the multinomial stochastic complexity
Information Processing Letters
NML computation algorithms for tree-structured multinomial Bayesian networks
EURASIP Journal on Bioinformatics and Systems Biology
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
A fast normalized maximum likelihood algorithm for multinomial data
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
IEEE Transactions on Signal Processing
Coding on countably infinite alphabets
IEEE Transactions on Information Theory
Hi-index | 754.90 |
For problems of data compression, gambling, and prediction of individual sequences x1, ···, xn the following questions arise. Given a target family of probability mass functions p(x1, ···, x n|θ), how do we choose a probability mass function q(x 1, ···, xn) so that it approximately minimizes the maximum regret/belowdisplayskip10ptminus6pt max (log1/q(x1, ···, xn)-log1/p(x1, ···, xn |θˆ)) and so that it achieves the best constant C in the asymptotics of the minimax regret, which is of the form (d/2)log(n/2π)+C+o(1), where d is the parameter dimension? Are there easily implementable strategies q that achieve those asymptotics? And how does the solution to the worst case sequence problem relate to the solution to the corresponding expectation version minq max 0 E0(log1/q(x1, ···, xn)-log1/p(x1, ···, xn|θ))? In the discrete memoryless case, with a given alphabet of size m, the Bayes procedure with the Dirichlet(1/2, ···, 1/2) prior is asymptotically maximin. Simple modifications of it are shown to be asymptotically minimax. The best constant is Cm=log(Γ(1/2)m/(Γ(m/2)) which agrees with the logarithm of the integral of the square root of the determinant of the Fisher information. Moreover, our asymptotically optimal strategies for the worst case problem are also asymptotically optimal for the expectation version. Analogous conclusions are given for the case of prediction, gambling, and compression when, for each observation, one has access to side information from an alphabet of size k. In this setting the minimax regret is shown to be k(m-1)/2logn/2πk+kCm+o(1)