Foundations of neural networks
Foundations of neural networks
Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Convergence analysis of perturbed feasible descent methods
Journal of Optimization Theory and Applications
Improved generalization via tolerant training
Journal of Optimization Theory and Applications
Error stability properties of generalized gradient-type algorithms
Journal of Optimization Theory and Applications
Neuro-Dynamic Programming
Neural Networks for Optimization and Signal Processing
Neural Networks for Optimization and Signal Processing
Incremental Least Squares Methods and the Extended Kalman Filter
SIAM Journal on Optimization
An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule
SIAM Journal on Optimization
A New Class of Incremental Gradient Methods for Least Squares Problems
SIAM Journal on Optimization
Incremental Subgradients for Constrained Convex Optimization: A Unified Framework and New Methods
SIAM Journal on Optimization
Hi-index | 0.00 |
We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An importantnovel feature of our analysis is that the stepsizes are kept bounded awayfrom zero. We derive the first convergence results of any kind for thiscomputationally important case. In particular, we show that a certainϵ-approximate solution can be obtained and establish the lineardependence of ϵ on the stepsize limit. Incremental gradient methodsare particularly well-suited for large neural network training problemswhere obtaining an approximate solution is typically sufficient and is oftenpreferable to computing an exact solution. Thus, in the context of neuralnetworks, the approach presented here is related to the principle oftolerant training. Our results justify numerous stepsize rules that werederived on the basis of extensive numerical experimentation but for which notheoretical analysis was previously available. In addition, convergence to(exact) stationary points is established when the gradient satisfies acertain growth property.