Learning coefficients of layered models when the true distribution mismatches the singularities

Authors:
Sumio Watanabe;Shun-ichi Amari
Affiliations:
Precision and Intelligence Laboratory, Tokyo Institute of Technology, Midori-ku, Yokohama, 226-8503 Japan;Laboratory for Mathematical Neuroscience, RIKEN Brain Science Institute, Wako-shi, Saitama, 351-0198, Japan
Venue:
Neural Computation
Year:
2003

Citing 5
Cited 7

Algebraic geometrical methods for hierarchical learning machines

Neural Networks
On the problem in model selection of neural network regression in overrealizable scenario

Neural Computation
Algebraic Analysis for Nonidentifiable Learning Machines

Neural Computation
Asymptotic model selection for naive Bayesian networks

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Learning efficiency of redundant neural networks in Bayesian estimation

IEEE Transactions on Neural Networks

Singularities Affect Dynamics of Learning in Neuromanifolds

Neural Computation
Variational Bayes Solution of Linear Neural Networks and Its Generalization Performance

Neural Computation
Dynamics of learning near singularities in layered networks

Neural Computation
Generalization error of linear neural networks in an empirical bayes approach

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Equations of states in singular statistical estimation

Neural Networks
Algebraic geometry of singular learning machines and symmetry of generalization and training errors

Neurocomputing
Generalization error of automatic relevance determination

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical learning machines such as layered neural networks have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degenerate, with the result that the conventional learning theory of regular statistical models does not hold. Recently, it was proved that if the parameter of the true distribution is contained in the singularities of the learning machine, the generalization error in Bayes estimation is asymptotically equal to λ/n, where 2λ is smaller than the dimension of the parameter and n is the number of training samples. However, the constant λ strongly depends on the local geometrical structure of singularities; hence, the generalization error is not yet clarified when the true distribution is almost but not completely contained in the singularities. In this article, in order to analyze such cases, we study the Bayes generalization error under the condition that the Kullback distance of the true distribution from the distribution represented by singularities is in proportion to 1/n and show two results. First, if the dimension of the parameter from inputs to hidden units is not larger than three, then there exists a region of true parameters such that the generalization error is larger than that of the corresponding regular model. Second, if the dimension from inputs to hidden units is larger than three, then for arbitrary true distribution, the generalization error is smaller than that of the corresponding regular model.