Decision theoretic generalizations of the PAC model for neural net and other learning applications
Information and Computation
Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension
Journal of Combinatorial Theory Series A
Performance bounds for nonlinear time series prediction
COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Learning dynamical systems in a stationary environment
Systems & Control Letters - Special issue: learning theory
Vapnik-chervonenkis generalization bounds for real valued neural networks
Neural Computation
Minimum complexity regression estimation with weakly dependent observations
IEEE Transactions on Information Theory - Part 2
Memory-universal prediction of stationary random processes
IEEE Transactions on Information Theory
Bayesian system identification via Markov chain Monte Carlo techniques
Automatica (Journal of IFAC)
Brief Non-asymptotic confidence ellipsoids for the least-squares estimate
Automatica (Journal of IFAC)
Non-asymptotic quality assessment of generalised FIR models with periodic inputs
Automatica (Journal of IFAC)
Brief New identification approaches for disturbed models
Automatica (Journal of IFAC)
Hi-index | 22.15 |
The asymptotic convergence properties of system identification methods are well known, but comparatively little is known about the practical situation where only a finite number of data points are available. In this paper we consider the finite sample properties of prediction error methods for system identification. We consider ARX models and uniformly bounded criterion functions. The problem we pose is: how many data points are required in order to guarantee with high probability that the expected value of the identification criterion is close to its empirical mean value. The sample sizes are obtained using generalisations of risk minimisation theory to weakly dependent processes. We obtain uniform probabilistic bounds on the difference between the expected value of the identification criterion and the empirical value evaluated on the observed data points. The bounds are very general, in particular no assumption is made about the true system belonging to the model class. Further analysis shows that in order to maintain a given bound on the difference, the number of data points required grows at most at a polynomial rate in the model order and in many cases no faster than quadratically. The results obtained here generalise previous results derived for the case where the observed data was independent and identically distributed.