Recursive adaptation of stepsize parameter for non-stationary environments

Authors:
Itsuki Noda
Affiliations:
ITRI, National Institute of Advanced Industrial Science and Technology
Venue:
ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Year:
2009

Citing 12
Cited 1

Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Speeding-up Reinforcement Learning with Multi-step Actions

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Learning Rates for Q-learning

The Journal of Machine Learning Research
Learning the task allocation game

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Machine Learning
Generalized gradient adaptive step sizes for stochastic gradient adaptive filters

ICASSP '95 Proceedings of the Acoustics, Speech, and Signal Processing, 1995. on International Conference - Volume 02
Regret based dynamics: convergence in weakly acyclic games

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
IFSA: incremental feature-set augmentation for reinforcement learning tasks

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research

Adaption of stepsize parameter using newton's method

PRIMA'11 Proceedings of the 14th international conference on Agents in Principle, Agents in Practice

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we propose a method to adapt stepsize parameters used in reinforcement learning for non-stationary environments. In general reinforcement learning situations, a stepsize parameter is decreased to zero during learning, because the environment is generally supposed to be noisy but stationary, such that the true expected rewards are fixed. On the other hand, we assume that in the real world, the true expected reward changes over time and hence, the learning agent must adapt the change through continuous learning. We derive the higher-order derivatives of exponential moving average (which is used to estimate the expected values of states or actions in major reinforcement learning methods) using stepsize parameters. We also illustrate a mechanism to calculate these derivatives in a recursive manner. Using the mechanism, we construct a precise and flexible adaptation method for the stepsize parameter in order to optimize a certain criterion, for example, to minimize square errors. The proposed method is validated both theoretically and experimentally.