A theoretical comparison of batch-mode, on-line, cyclic, and almost-cyclic learning

Authors:
T. Heskes;W. Wigerinck
Affiliations:
Dept. of Med. Phys. & Biophys., Nijmegen Univ.;-
Venue:
IEEE Transactions on Neural Networks
Year:
1996

Citing 0
Cited 10

Combining Lateral and Elastic Interactions: Topology-Preserving Elastic Nets

Neural Processing Letters
Topology-Preserving Elastic Nets

IWANN '01 Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks: Connectionist Models of Neurons, Learning Processes and Artificial Intelligence-Part I
Online Learning from Finite Training Sets and Robustness to Input Bias

Neural Computation
A recursive algorithm for nonlinear least-squares problems

Computational Optimization and Applications
How dependencies between successive examples affect on-line learning

Neural Computation
Oblique Support Vector Machines

Informatica
Comparisons of single- and multiple-hidden-layer neural networks

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part I
Computational properties and convergence analysis of BPNN for cyclic and almost cyclic learning with penalty

Neural Networks
Computational properties of cyclic and almost-cyclic learning with momentum for feedforward neural networks

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Convergence of online gradient method for feedforward neural networks with smoothing L 1/2 regularization penalty

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study and compare different neural network learning strategies: batch-mode learning, online learning, cyclic learning, and almost-cyclic learning. Incremental learning strategies require less storage capacity than batch-mode learning. However, due to the arbitrariness in the presentation order of the training patterns, incremental learning is a stochastic process; whereas batch-mode learning is deterministic. In zeroth order, i.e., as the learning parameter η tends to zero, all learning strategies approximate the same ordinary differential equation for convenience referred to as the “ideal behavior”. Using stochastic methods valid for small learning parameters η, we derive differential equations describing the evolution of the lowest-order deviations from this ideal behavior. We compute how the asymptotic misadjustment, measuring the average asymptotic distance from a stable fixed point of the ideal behavior, scales as a function of the learning parameter and the number of training patterns. Knowing the asymptotic misadjustment, we calculate the typical number of learning steps necessary to generate a weight within order ε of this fixed point, both with fixed and time-dependent learning parameters. We conclude that almost-cyclic learning (learning with random cycles) is a better alternative for batch-mode learning than cyclic learning (learning with a fixed cycle)