Model selection by sequentially normalized least squares

Authors:
Jorma Rissanen;Teemu Roos;Petri Myllymäki
Affiliations:
Helsinki Institute for Information Technology HIIT, Finland;Helsinki Institute for Information Technology HIIT, Finland;Helsinki Institute for Information Technology HIIT, Finland
Venue:
Journal of Multivariate Analysis
Year:
2010

Citing 8
Cited 1

Elements of information theory

Elements of information theory
Information and Complexity in Statistical Modeling

Information and Complexity in Statistical Modeling
MDL denoising revisited

IEEE Transactions on Signal Processing
Paper: Modeling by shortest data description

Automatica (Journal of IFAC)
Fisher information and stochastic complexity

IEEE Transactions on Information Theory
The minimum description length principle in coding and modeling

IEEE Transactions on Information Theory
MDL denoising

IEEE Transactions on Information Theory
Universal prediction of individual sequences

IEEE Transactions on Information Theory

Real-time change-point detection using sequentially discounting normalized maximum likelihood coding

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Model selection by means of the predictive least squares (PLS) principle has been thoroughly studied in the context of regression model selection and autoregressive (AR) model order estimation. We introduce a new criterion based on sequentially minimized squared deviations, which are smaller than both the usual least squares and the squared prediction errors used in PLS. We also prove that our criterion has a probabilistic interpretation as a model which is asymptotically optimal within the given class of distributions by reaching the lower bound on the logarithmic prediction errors, given by the so called stochastic complexity, and approximated by BIC. This holds when the regressor (design) matrix is non-random or determined by the observed data as in AR models. The advantages of the criterion include the fact that it can be evaluated efficiently and exactly, without asymptotic approximations, and importantly, there are no adjustable hyper-parameters, which makes it applicable to both small and large amounts of data.