Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Authors:
Abraham P. George;Warren B. Powell
Affiliations:
Department of Operations Research and Financial Engineering, Princeton University, Princeton 08544;Department of Operations Research and Financial Engineering, Princeton University, Princeton 08544
Venue:
Machine Learning
Year:
2006

Citing 10
Cited 17

Recursive estimation and time-series analysis: an introduction

Recursive estimation and time-series analysis: an introduction
Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
Note on learning rate schedules for stochastic optimization

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Exponentiated Gradient Methods for Reinforcement Learning

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Introduction to Stochastic Search and Optimization

Introduction to Stochastic Search and Optimization
Learning Rates for Q-learning

The Journal of Machine Learning Research
Generalized gradient adaptive step sizes for stochastic gradient adaptive filters

ICASSP '95 Proceedings of the Acoustics, Speech, and Signal Processing, 1995. on International Conference - Volume 02
A stochastic gradient adaptive filter with gradient adaptive stepsize

IEEE Transactions on Signal Processing

The optimizing-simulator: merging simulation and optimization using approximate dynamic programming

Proceedings of the 39th conference on Winter simulation: 40 years! The best is yet to come
Approximate dynamic programming: lessons from the field

Proceedings of the 40th Conference on Winter Simulation
An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application

Transportation Science
Recursive Adaptation of Stepsize Parameter for Non-stationary Environments

PRIMA '09 Proceedings of the 12th International Conference on Principles of Practice in Multi-Agent Systems
Multi-agent based distributed inventory control model

Expert Systems with Applications: An International Journal
Dynamic Programming Models and Algorithms for the Mutual Fund Cash Balance Problem

Management Science
Adaptive ε-greedy exploration in reinforcement learning based on value differences

KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Minimizing total tardiness in a stochastic single machine scheduling problem using approximate dynamic programming

Journal of Scheduling
The Effect of Robust Decisions on the Cost of Uncertainty in Military Airlift Operations

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Value-difference based exploration: adaptive control between epsilon-greedy and softmax

KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
Recursive adaptation of stepsize parameter for non-stationary environments

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Adaption of stepsize parameter using newton's method

PRIMA'11 Proceedings of the 14th international conference on Agents in Principle, Agents in Practice
Gradient algorithms for exploration/exploitation trade-offs: global and local variants

ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes

Proceedings of the Winter Simulation Conference
Sourcing strategies in supply risk management: An approximate dynamic programming approach

Computers and Operations Research
2013 Special Issue: Autonomous reinforcement learning with experience replay

Neural Networks
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.