On the problem in model selection of neural network regression in overrealizable scenario

Authors:
Katsuyuki Hagiwara
Affiliations:
Faculty of Physics Engineering, Mie University, Tsu, 514-8507, Japan
Venue:
Neural Computation
Year:
2002

Citing 6
Cited 16

Kick-out learning algorithm to reduce the oscillation of weights

Neural Networks
Natural gradient works efficiently in learning

Neural Computation
Local minima and plateaus in hierarchical structures of multilayer perceptions

Neural Networks
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Upper bound of the expected training error of neural network regression for a Gaussian noise sequence

Neural Networks
Algebraic Analysis for Nonidentifiable Learning Machines

Neural Computation

Learning coefficients of layered models when the true distribution mismatches the singularities

Neural Computation
On the asymptotic distribution of the least-squares estimators in unidentifiable models

Neural Computation
Improving Generalization Performance of Natural Gradient Learning Using Optimized Regularization by NIC

Neural Computation
Singularities Affect Dynamics of Learning in Neuromanifolds

Neural Computation
Part 2: multilayer perceptron and natural gradient learning

New Generation Computing
Variational Bayes Solution of Linear Neural Networks and Its Generalization Performance

Neural Computation
Dynamics of learning near singularities in layered networks

Neural Computation
Relation between weight size and degree of over-fitting in neural network regression

Neural Networks
Singularity and Slow Convergence of the EM algorithm for Gaussian Mixtures

Neural Processing Letters
Equations of states in singular statistical estimation

Neural Networks
Asymptotic analysis of Bayesian generalization error with Newton diagram

Neural Networks
Algebraic geometry of singular learning machines and symmetry of generalization and training errors

Neurocomputing
Algebraic geometry and stochastic complexity of hidden Markov models

Neurocomputing
Topology estimation of hierarchical hidden Markov models for language models

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

The Journal of Machine Learning Research
Stochastic complexity of bayesian networks

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In considering a statistical model selection of neural networks and radial basis functions under an overrealizable case, the problem of unidentifiability emerges. Because the model selection criterion is an unbiased estimator of the generalization error based on the training error, this article analyzes the expected training error and the expected generalization error of neural networks and radial basis functions in overrealizable cases and clarifies the difference from regular models, for which identifiability holds. As a special case of an overrealizable scenario, we assumed a gaussian noise sequence as training data. In the least-squares estimation under this assumption, we first formulated the problem, in which the calculation of the expected errors of unidentifiable networks is reduced to the calculation of the expectation of the supremum of the χ2 process. Under this formulation, we gave an upper bound of the expected training error and a lower bound of the expected generalization error, where the generalization is measured at a set of training inputs. Furthermore, we gave stochastic bounds on the training error and the generalization error. The obtained upper bound of the expected training error is smaller than in regular models, and the lower bound of the expected generalization error is larger than in regular models. The result tells us that the degree of overfitting in neural networks and radial basis functions is higher than in regular models. Correspondingly, it also tells us that the generalization capability is worse than in the case of regular models. The article may be enough to show a difference between neural networks and regular models in the context of the least-squares estimation in a simple situation. This is a first step in constructing a model selection criterion in an overrealizable case. Further important problems in this direction are also included in this article.