Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
A General Convergence Method for Reinforcement Learning in the Continuous Case
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Reinforcement Learning in Continuous Time and Space
Neural Computation
Applying neural network to reinforcement learning in continuous spaces
ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part I
Hi-index | 0.00 |
In this paper, we propose a convergent Reinforcement Learning algorithm for solving optimal control problems for which the state space and the time are continuous variables. The problem of computing a good approximation of the value function, which is essential because this provides the optimal control, is a difficult task in the continuous case. Indeed, as it has been pointed out by several authors, the use of parameterized functions such as neural networks for approximating the value function may produce very bad results and even diverge. In fact, we show that classical algorithms, like Q-learning, used with a simple look-up table built on a regular grid, may fail to converge. The main reason is that the discretization of the state space implies a lost of the Markov property even for deterministic continuous processes. We propose to approximate the value function with a convergent numerical scheme based on a Finite Difference approximation of the Hamilton-Jacobi-Bellman equation. Then we present a model-free reinforcement learning algorithrn, called Finite Difference Reinforcement Learning, and prove its convergence to the value function of the continuous problem.