Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
Automatica (Journal of IFAC)
Neural Network Control of Robot Manipulators and Nonlinear Systems
Neural Network Control of Robot Manipulators and Nonlinear Systems
Reinforcement Learning in Continuous Time and Space
Neural Computation
Brief paper: Adaptive optimal control for continuous-time linear systems based on policy iteration
Automatica (Journal of IFAC)
Adaptive optimal controllers based on Generalized Policy Iteration in a continuous-time framework
MED '09 Proceedings of the 2009 17th Mediterranean Conference on Control and Automation
Automatica (Journal of IFAC)
IEEE Transactions on Neural Networks
Continuous-Time Adaptive Critics
IEEE Transactions on Neural Networks
Neural net robot controller with guaranteed tracking performance
IEEE Transactions on Neural Networks
Automatica (Journal of IFAC)
Automatica (Journal of IFAC)
Automatica (Journal of IFAC)
Automatica (Journal of IFAC)
Automatica (Journal of IFAC)
Information Sciences: an International Journal
Information Sciences: an International Journal
Automatica (Journal of IFAC)
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
On integral generalized policy iteration for continuous-time linear quadratic regulations
Automatica (Journal of IFAC)
Hi-index | 22.16 |
In this paper we discuss an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the optimal control design HJ equation. This method finds in real-time suitable approximations of both the optimal cost and the optimal control policy, while also guaranteeing closed-loop stability. We present an online adaptive algorithm implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of both actor and critic neural networks. We call this 'synchronous' policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra nonstandard terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and the stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.