A numerical study on learning curves in stochastic multilayer feedforward networks

Authors:
K. -R. Mü/ller;M. Finke;N. Murata;K. Schulten;S. Amari
Affiliations:
Department of Mathematical Engineering and Inf. Physics, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113, Japan;Institut fü/r Logik, University of Karlsruhe, 76128 Karlsruhe, Germany;Department of Mathematical Engineering and Inf. Physics, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113, Japan;Beckman Institute, University of Illinois, 405 North Mathews Ave., Urbana IL USA;Department of Mathematical Engineering and Inf Physics, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 123, Japan/ Lab. f. Inf. Representation, RIKEN, Wakoshi, Saitama, 351-01, Japan
Venue:
Neural Computation
Year:
1996

Citing 6
Cited 1

What size net gives valid generalization?

Neural Computation
Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise

COLT '91 Proceedings of the fourth annual workshop on Computational learning theory
Rigorous learning curve bounds from statistical mechanics

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Statistical theory of learning curves under entropic loss criterion

Neural Computation
An experimental and theoretical comparison of model selection methods

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Learning Curves, Model Selection and Complexity of Neural Networks

Advances in Neural Information Processing Systems 5, [NIPS Conference]

Asymptotic properties of the Fisher kernel

Neural Computation

Quantified Score

Hi-index	0.01

Visualization

Abstract

The universal asymptotic scaling laws proposed by Amari et al. are studied in large scale simulations using a CM5. Small stochastic multilayer feedforward networks trained with backpropagation are investigated. In the range of a large number of training patterns t, the asymptotic generalization error scales as 1/t as predicted. For a medium range t a faster 1/t2 scaling is observed. This effect is explained by using higher order corrections of the likelihood expansion. It is shown for small t that the scaling law changes drastically, when the network undergoes a transition from strong overfitting to effective learning.