Strong optimality of the normalized ML models as universal codes and information in data

Authors:
J. Rissanen
Affiliations:
IBM Res. Div., Almaden Res. Center, San Jose, CA
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 12

Strong Entropy Concentration, Game Theory, and Algorithmic Randomness

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Classification and feature gene selection using the normalized maximum likelihood model for discrete regression

Signal Processing - Special issue: Genomic signal processing
On some properties of the NML estimator for Bernoulli strings

Information Processing Letters
A note on the applied use of MDL approximations

Neural Computation
A linear-time algorithm for computing the multinomial stochastic complexity

Information Processing Letters
NML computation algorithms for tree-structured multinomial Bayesian networks

EURASIP Journal on Bioinformatics and Systems Biology
Characterization of imaging phone cameras using minimum description length principle

WSEAS Transactions on Information Science and Applications
A fast normalized maximum likelihood algorithm for multinomial data

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
MDL denoising revisited

IEEE Transactions on Signal Processing
Universal models for the exponential distribution

IEEE Transactions on Information Theory
Feature selection for Bayesian network classifiers using the MDL-FS score

International Journal of Approximate Reasoning
Minimum description length characterization of low end color cameras

CIMMACS'07 Proceedings of the 6th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics

Quantified Score

Hi-index	754.90

Visualization

Abstract

We show that the normalized maximum-likelihood (NML) distribution as a universal code for a parametric class of models is closest to the negative logarithm of the maximized likelihood in the mean code length distance, where the mean is taken with respect to the worst case model inside or outside the parametric class. We strengthen this result by showing that, when the data generating models are restricted to be the most “benevolent” ones in that they incorporate all the constraints in the data and no more, the bound cannot be beaten in essence by any code except when the mean is taken with respect to the data generating models in a set of vanishing size. These results allow us to decompose the code of the data into two parts, the first having all the useful information in the data that can be extracted with the family in question and the rest which has none, and we obtain a measure for the (useful) information in data