Tracking in Reinforcement Learning

Authors:
Matthieu Geist;Olivier Pietquin;Gabriel Fricout
Affiliations:
IMS Research Group, Supélec, Metz, France and MC Cluster, ArcelorMittal Research, Maizières-lès-Metz, France and CORIDA project-team, INRIA Nancy - Grand Est, France;IMS Research Group, Supélec, Metz, France;MC Cluster, ArcelorMittal Research, Maizières-lès-Metz, France
Venue:
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part I
Year:
2009

Citing 0
Cited 3

Revisiting natural actor-critics with value function approximation

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Kalman temporal differences

Journal of Artificial Intelligence Research
Social signal and user adaptation in reinforcement learning-based dialogue management

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication

Quantified Score

Hi-index	0.01

Visualization

Abstract

Reinforcement learning induces non-stationarity at several levels. Adaptation to non-stationary environments is of course a desired feature of a fair RL algorithm. Yet, even if the environment of the learning agent can be considered as stationary, generalized policy iteration frameworks, because of the interleaving of learning and control, will produce non-stationarity of the evaluated policy and so of its value function. Tracking the optimal solution instead of trying to converge to it is therefore preferable. In this paper, we propose to handle this tracking issue with a Kalman-based temporal difference framework. Complexity and convergence analysis are studied. Empirical investigations of its ability to handle non-stationarity is finally provided.