Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
Numerical methods for stochastic control problems in continuous time
SIAM Journal on Control and Optimization
Connectionist learning for control: an overview
Neural networks for control
Automatic programming of behavior-based robots using reinforcement learning
Artificial Intelligence
Reinforcement learning and its application to control
Reinforcement learning and its application to control
Numerical methods for stochastic control problems in continuous time
Numerical methods for stochastic control problems in continuous time
Reinforcement learning for robots using neural networks
Reinforcement learning for robots using neural networks
Rates of Convergence for Approximation Schemes in Optimal Control
SIAM Journal on Control and Optimization
Reinforcement learning for continuous stochastic control problems
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Adaptive choice of grid and time in reinforcement learning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Barycentric interpolators for continuous space & time reinforcement learning
Proceedings of the 1998 conference on Advances in neural information processing systems II
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming
ECML '97 Proceedings of the 9th European Conference on Machine Learning
A General Convergence Method for Reinforcement Learning in the Continuous Case
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Dynamic Programming
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
A simplification of the backpropagation-through-time algorithm for optimal neurocontrol
IEEE Transactions on Neural Networks
Variable Resolution Discretization in Optimal Control
Machine Learning
Variable resolution discretization for high-accuracy solutions of optimal control problems
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Planning under continuous time and resource uncertainty: a challenge for AI
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Q learning based on self-organizing fuzzy radial basis function network
ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Reinforcement learning for rule extraction from a labeled dataset
Cognitive Systems Research
Hi-index | 0.00 |
This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulative reinforcement. In the continuous case, the value function satisfies a non-linear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the Hamilton-Jacobi-Bellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradient-descent methods may converge to one of these generalized solutions, thus failing to find the optimal control.In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, which is the value function. Then, we use another main result of VSs (their stability when passing to the limit) to prove the convergence of numerical approximations schemes based on finite difference (FD) and finite element (FE) methods. These methods discretize, at some resolution, the HJB equation into a DP equation of a Markov Decision Process (MDP), which can be solved by DP methods (thanks to a “strong” contraction property) if all the initial data (the state dynamics and the reinforcement function) were perfectly known. However, in the RL approach, as we consider a system in interaction with some a priori (at least partially) unknown environment, which learns “from experience”, the initial data are not perfectly known but have to be approximated during learning. The main contribution of this work is to derive a general convergence theorem for RL algorithms when one uses only “approximations” (in a sense of satisfying some “weak” contraction property) of the initial data. This result can be used for model-based or model-free RL algorithms, with off-line or on-line updating methods, for deterministic or stochastic state dynamics (though this latter case is not described here), and based on FE or FD discretization methods. It is illustrated with several RL algorithms and one numerical simulation for the “Car on the Hill” problem.