Minimum description length induction, Bayesianism, and Kolmogorov complexity

Authors:
P. M.B. Vitanyi;Ming Li
Affiliations:
CWI, Amsterdam;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 29

Effects of compression on language evolution

Artificial Life
Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Towards an Algorithmic Statistics

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Meaningful Information

ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
On different facets of regularization theory

Neural Computation
Kolmogorov Complexity and Information Theory.With an Interpretation in Terms of Questions and Answers

Journal of Logic, Language and Information
Optimality of universal Bayesian sequence prediction for general loss and alphabet

The Journal of Machine Learning Research
Predictability, Complexity, and Learning

Neural Computation
MDL convergence speed for Bernoulli sequences

Statistics and Computing
Interestingness measures for data mining: A survey

ACM Computing Surveys (CSUR)
MODL: A Bayes optimal discretization method for continuous attributes

Machine Learning
Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series

Theoretical Computer Science
Compact representations as a search strategy: compression EDAs

Theoretical Computer Science - Foundations of genetic algorithms
On generalized computable universal priors and their convergence

Theoretical Computer Science - Algorithmic learning theory
Denoising using local projective subspace methods

Neurocomputing
On semimeasures predicting Martin-Löf random sequences

Theoretical Computer Science
Ockham's razor, empirical complexity, and truth-finding efficiency

Theoretical Computer Science
Applications of Kolmogorov Complexity and Universal Codes to Nonparametric Estimation of Characteristics of Time Series

Fundamenta Informaticae
A NON-PARAMETRIC APPROACH TO SIMPLICITY CLUSTERING

Applied Artificial Intelligence
Prefetching based on web usage mining

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Hierarchical Extraction of Independent Subspaces of Unknown Dimensions

ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Occam's Razor and a non-syntactic measure of decision tree complexity

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Using Kolmogorov complexity for understanding some limitations on steganography

ISIT'09 Proceedings of the 2009 IEEE international conference on Symposium on Information Theory - Volume 4
Sequential predictions based on algorithmic complexity

Journal of Computer and System Sciences
Computable Bayesian compression for uniformly discretizable statistical models

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Compression and learning in linear regression

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Optimal bayesian 2d-discretization for variable ranking in regression

DS'06 Proceedings of the 9th international conference on Discovery Science
Compact genetic codes as a search strategy of evolutionary processes

FOGA'05 Proceedings of the 8th international conference on Foundations of Genetic Algorithms
Applications of Kolmogorov Complexity and Universal Codes to Nonparametric Estimation of Characteristics of Time Series

Fundamenta Informaticae

Quantified Score

Hi-index	754.84

Visualization

Abstract

The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles minimum description length (MDL) and minimum message length (MML), abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the fundamental inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. The ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to finite sets then application of the ideal principle turns into Kolmogorov's minimal sufficient statistic. In general, we show that data compression is almost always the best strategy, both in model selection and prediction