Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
Automatica (Journal of IFAC)
Oscillations in Neural Systems
Oscillations in Neural Systems
Reinforcement Learning in Continuous Time and Space
Neural Computation
2009 Special Issue: Language and cognition
Neural Networks
Automatica (Journal of IFAC)
IEEE Transactions on Neural Networks
Continuous-Time Adaptive Critics
IEEE Transactions on Neural Networks
Reinforcement learning and adaptive dynamic programming for feedback control
IEEE Circuits and Systems Magazine
Automatica (Journal of IFAC)
Automatica (Journal of IFAC)
Information Sciences: an International Journal
Automatica (Journal of IFAC)
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Hi-index | 0.01 |
In this paper we present in a continuous-time framework an online approach to direct adaptive optimal control with infinite horizon cost for nonlinear systems. The algorithm converges online to the optimal control solution without knowledge of the internal system dynamics. Closed-loop dynamic stability is guaranteed throughout. The algorithm is based on a reinforcement learning scheme, namely Policy Iterations, and makes use of neural networks, in an Actor/Critic structure, to parametrically represent the control policy and the performance of the control system. The two neural networks are trained to express the optimal controller and optimal cost function which describes the infinite horizon control performance. Convergence of the algorithm is proven under the realistic assumption that the two neural networks do not provide perfect representations for the nonlinear control and cost functions. The result is a hybrid control structure which involves a continuous-time controller and a supervisory adaptation structure which operates based on data sampled from the plant and from the continuous-time performance dynamics. Such control structure is unlike any standard form of controllers previously seen in the literature. Simulation results, obtained considering two second-order nonlinear systems, are provided.