Theoretical analysis of batch and on-line training for gradient descent learning in neural networks

Authors:
Takéhiko Nakama
Affiliations:
Department of Applied Mathematics and Statistics, The Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD 21218, USA
Venue:
Neurocomputing
Year:
2009

Citing 12
Cited 5

Speaker-independent isolated digit recognition: multilayered perceptrons vs. dynamic time warping

Neural Networks
Fundamentals of neural networks: architectures, algorithms, and applications

Fundamentals of neural networks: architectures, algorithms, and applications
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Fundamentals of Artificial Neural Networks

Fundamentals of Artificial Neural Networks
Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Learning in Neural Networks: Theoretical Foundations

Learning in Neural Networks: Theoretical Foundations
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Advanced Methods in Neural Computing

Advanced Methods in Neural Computing
The general inefficiency of batch training for gradient descent learning

Neural Networks
Efficient agnostic learning of neural networks with bounded fan-in

IEEE Transactions on Information Theory - Part 2
New results on recurrent network training: unifying the algorithms and accelerating convergence

IEEE Transactions on Neural Networks
Robust support vector regression networks for function approximation with outliers

IEEE Transactions on Neural Networks

Convergence analysis of online gradient method for BP neural networks

Neural Networks
Comparisons of single- and multiple-hidden-layer neural networks

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part I
Computational properties and convergence analysis of BPNN for cyclic and almost cyclic learning with penalty

Neural Networks
Self-Organizing Hidden Markov Model Map (SOHMMM)

Neural Networks
Convergence of online gradient method for feedforward neural networks with smoothing L 1/2 regularization penalty

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this study, we theoretically analyze two essential training schemes for gradient descent learning in neural networks: batch and on-line training. The convergence properties of the two schemes applied to quadratic loss functions are analytically investigated. We quantify the convergence of each training scheme to the optimal weight using the absolute value of the expected difference (Measure 1) and the expected squared difference (Measure 2) between the optimal weight and the weight computed by the scheme. Although on-line training has several advantages over batch training with respect to the first measure, it does not converge to the optimal weight with respect to the second measure if the variance of the per-instance gradient remains constant. However, if the variance decays exponentially, then on-line training converges to the optimal weight with respect to Measure 2. Our analysis reveals the exact degrees to which the training set size, the variance of the per-instance gradient, and the learning rate affect the rate of convergence for each scheme.