Adaptive algorithms and stochastic approximations
Adaptive algorithms and stochastic approximations
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Multiagent learning using a variable learning rate
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Speeding-up Reinforcement Learning with Multi-step Actions
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
The Journal of Machine Learning Research
Learning the task allocation game
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Generalized gradient adaptive step sizes for stochastic gradient adaptive filters
ICASSP '95 Proceedings of the Acoustics, Speech, and Signal Processing, 1995. on International Conference - Volume 02
Regret based dynamics: convergence in weakly acyclic games
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
IFSA: incremental feature-set augmentation for reinforcement learning tasks
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Adaption of stepsize parameter using newton's method
PRIMA'11 Proceedings of the 14th international conference on Agents in Principle, Agents in Practice
Hi-index | 0.00 |
In this article, we propose a method to adapt stepsize parameters used in reinforcement learning for non-stationary environments. In general reinforcement learning situations, a stepsize parameter is decreased to zero during learning, because the environment is generally supposed to be noisy but stationary, such that the true expected rewards are fixed. On the other hand, we assume that in the real world, the true expected reward changes over time and hence, the learning agent must adapt the change through continuous learning. We derive the higher-order derivatives of exponential moving average (which is used to estimate the expected values of states or actions in major reinforcement learning methods) using stepsize parameters. We also illustrate a mechanism to calculate these derivatives in a recursive manner. Using the mechanism, we construct a precise and flexible adaptation method for the stepsize parameter in order to optimize a certain criterion, for example, to minimize square errors. The proposed method is validated both theoretically and experimentally.