Recursive estimation and time-series analysis: an introduction
Recursive estimation and time-series analysis: an introduction
Adaptive algorithms and stochastic approximations
Adaptive algorithms and stochastic approximations
Note on learning rate schedules for stochastic optimization
NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Neuro-Dynamic Programming
Exponentiated Gradient Methods for Reinforcement Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
The Journal of Machine Learning Research
Generalized gradient adaptive step sizes for stochastic gradient adaptive filters
ICASSP '95 Proceedings of the Acoustics, Speech, and Signal Processing, 1995. on International Conference - Volume 02
A stochastic gradient adaptive filter with gradient adaptive stepsize
IEEE Transactions on Signal Processing
The optimizing-simulator: merging simulation and optimization using approximate dynamic programming
Proceedings of the 39th conference on Winter simulation: 40 years! The best is yet to come
Approximate dynamic programming: lessons from the field
Proceedings of the 40th Conference on Winter Simulation
Recursive Adaptation of Stepsize Parameter for Non-stationary Environments
PRIMA '09 Proceedings of the 12th International Conference on Principles of Practice in Multi-Agent Systems
Multi-agent based distributed inventory control model
Expert Systems with Applications: An International Journal
Adaptive ε-greedy exploration in reinforcement learning based on value differences
KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
The Effect of Robust Decisions on the Cost of Uncertainty in Military Airlift Operations
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Value-difference based exploration: adaptive control between epsilon-greedy and softmax
KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
Recursive adaptation of stepsize parameter for non-stationary environments
ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Adaption of stepsize parameter using newton's method
PRIMA'11 Proceedings of the 14th international conference on Agents in Principle, Agents in Practice
Gradient algorithms for exploration/exploitation trade-offs: global and local variants
ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes
Proceedings of the Winter Simulation Conference
Sourcing strategies in supply risk management: An approximate dynamic programming approach
Computers and Operations Research
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Hi-index | 0.00 |
We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.