A convergent reinforcement learning algorithm in the continuous case based on a finite difference method

Authors:
Remi Munos
Affiliations:
DASSAULT-AVIATION, DGT-DTN-EL, Saint-Cloud, France and CEMAGREF, LISC, Antony Cedex, France
Venue:
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Year:
1997

Citing 2
Cited 3

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
A General Convergence Method for Reinforcement Learning in the Continuous Case

ECML '98 Proceedings of the 10th European Conference on Machine Learning

A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions

Machine Learning
Reinforcement Learning in Continuous Time and Space

Neural Computation
Applying neural network to reinforcement learning in continuous spaces

ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a convergent Reinforcement Learning algorithm for solving optimal control problems for which the state space and the time are continuous variables. The problem of computing a good approximation of the value function, which is essential because this provides the optimal control, is a difficult task in the continuous case. Indeed, as it has been pointed out by several authors, the use of parameterized functions such as neural networks for approximating the value function may produce very bad results and even diverge. In fact, we show that classical algorithms, like Q-learning, used with a simple look-up table built on a regular grid, may fail to converge. The main reason is that the discretization of the state space implies a lost of the Markov property even for deterministic continuous processes. We propose to approximate the value function with a convergent numerical scheme based on a Finite Difference approximation of the Hamilton-Jacobi-Bellman equation. Then we present a model-free reinforcement learning algorithrn, called Finite Difference Reinforcement Learning, and prove its convergence to the value function of the continuous problem.