An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule

Authors:
Paul Tseng
Affiliations:
-
Venue:
SIAM Journal on Optimization
Year:
1998

Citing 0
Cited 8

Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

Computational Optimization and Applications
The Incremental Gauss-Newton Algorithm with Adaptive Stepsize Rule

Computational Optimization and Applications
Spectral projected subgradient with a momentum term for the Lagrangean dual approach

Computers and Operations Research
A merit function approach to the subgradient method with averaging

Optimization Methods & Software
A modified Armijo rule for the online selection of learning rate of the LMS algorithm

Digital Signal Processing
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

The Journal of Machine Learning Research
Incremental Subgradients for Constrained Convex Optimization: A Unified Framework and New Methods

SIAM Journal on Optimization
Towards a unified architecture for in-RDBMS analytics

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider an incremental gradient method with momentum term for minimizing the sum of continuously differentiable functions. This method uses a new adaptive stepsize rule that decreases the stepsize whenever sufficient progress is not made. We show that if the gradients of the functions are bounded and Lipschitz continuous over a certain level set, then every cluster point of the iterates generated by the method is a stationary point. In addition, if the gradient of the functions have a certain growth property, then the method is either linearly convergent in some sense or the stepsizes are bounded away from zero. The new stepsize rule is much in the spirit of heuristic learning rules used in practice for training neural networks via backpropagation. As such, the new stepsize rule may suggest improvements on existing learning rules. Finally, extension of the method and the convergence results to constrained minimization is discussed, as are some implementation issues and numerical experience.