A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions

Authors:
Rémi Munos
Affiliations:
Carnegie Mellon University, Robotics Institute, Pittsburgh, PA 15213, USA. munos@cs.cmu.edu
Venue:
Machine Learning
Year:
2000

Citing 23
Cited 5

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Numerical methods for stochastic control problems in continuous time

SIAM Journal on Control and Optimization
Connectionist learning for control: an overview

Neural networks for control
Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Reinforcement learning and its application to control

Reinforcement learning and its application to control
Numerical methods for stochastic control problems in continuous time

Numerical methods for stochastic control problems in continuous time
Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks
The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces

Machine Learning
Rates of Convergence for Approximation Schemes in Optimal Control

SIAM Journal on Control and Optimization
Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences

Computing
Reinforcement learning for continuous stochastic control problems

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Adaptive choice of grid and time in reinforcement learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Barycentric interpolators for continuous space & time reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems

ECML '97 Proceedings of the 9th European Conference on Machine Learning
A General Convergence Method for Reinforcement Learning in the Continuous Case

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Dynamic Programming

Dynamic Programming
A convergent reinforcement learning algorithm in the continuous case based on a finite difference method

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
A simplification of the backpropagation-through-time algorithm for optimal neurocontrol

IEEE Transactions on Neural Networks

Variable Resolution Discretization in Optimal Control

Machine Learning
Variable resolution discretization for high-accuracy solutions of optimal control problems

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Planning under continuous time and resource uncertainty: a challenge for AI

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Q learning based on self-organizing fuzzy radial basis function network

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Reinforcement learning for rule extraction from a labeled dataset

Cognitive Systems Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulative reinforcement. In the continuous case, the value function satisfies a non-linear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the Hamilton-Jacobi-Bellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradient-descent methods may converge to one of these generalized solutions, thus failing to find the optimal control.In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, which is the value function. Then, we use another main result of VSs (their stability when passing to the limit) to prove the convergence of numerical approximations schemes based on finite difference (FD) and finite element (FE) methods. These methods discretize, at some resolution, the HJB equation into a DP equation of a Markov Decision Process (MDP), which can be solved by DP methods (thanks to a “strong” contraction property) if all the initial data (the state dynamics and the reinforcement function) were perfectly known. However, in the RL approach, as we consider a system in interaction with some a priori (at least partially) unknown environment, which learns “from experience”, the initial data are not perfectly known but have to be approximated during learning. The main contribution of this work is to derive a general convergence theorem for RL algorithms when one uses only “approximations” (in a sense of satisfying some “weak” contraction property) of the initial data. This result can be used for model-based or model-free RL algorithms, with off-line or on-line updating methods, for deterministic or stochastic state dynamics (though this latter case is not described here), and based on FE or FD discretization methods. It is illustrated with several RL algorithms and one numerical simulation for the “Car on the Hill” problem.