Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem

Authors:
Kyriakos G. Vamvoudakis;Frank L. Lewis
Affiliations:
Automation and Robotics Research Institute, The University of Texas at Arlington, 7300 Jack Newell Blvd. S., Ft. Worth, TX 76118, USA;Automation and Robotics Research Institute, The University of Texas at Arlington, 7300 Jack Newell Blvd. S., Ft. Worth, TX 76118, USA
Venue:
Automatica (Journal of IFAC)
Year:
2010

Citing 10
Cited 16

Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks

Neural Networks
Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation

Automatica (Journal of IFAC)
Neural Network Control of Robot Manipulators and Nonlinear Systems

Neural Network Control of Robot Manipulators and Nonlinear Systems
Reinforcement Learning in Continuous Time and Space

Neural Computation
Brief paper: Adaptive optimal control for continuous-time linear systems based on policy iteration

Automatica (Journal of IFAC)
Adaptive optimal controllers based on Generalized Policy Iteration in a continuous-time framework

MED '09 Proceedings of the 2009 17th Mediterranean Conference on Control and Automation
Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach

Automatica (Journal of IFAC)
Adaptive critic designs

IEEE Transactions on Neural Networks
Continuous-Time Adaptive Critics

IEEE Transactions on Neural Networks
Neural net robot controller with guaranteed tracking performance

IEEE Transactions on Neural Networks

Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations

Automatica (Journal of IFAC)
Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach

Neurocomputing
Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence

Neurocomputing
Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming

Automatica (Journal of IFAC)
Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality

Automatica (Journal of IFAC)
Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics

Automatica (Journal of IFAC)
Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

Automatica (Journal of IFAC)
An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs

Information Sciences: an International Journal
Simultaneous policy update algorithms for learning the solution of linear continuous-time H∞ state feedback control

Information Sciences: an International Journal
A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems

Automatica (Journal of IFAC)
Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm

Neurocomputing
Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique

Neurocomputing
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal
Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems

Automatica (Journal of IFAC)
Fixed-final-time optimal tracking control of input-affine nonlinear systems

Neurocomputing
On integral generalized policy iteration for continuous-time linear quadratic regulations

Automatica (Journal of IFAC)

Quantified Score

Hi-index	22.16

Visualization

Abstract

In this paper we discuss an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the optimal control design HJ equation. This method finds in real-time suitable approximations of both the optimal cost and the optimal control policy, while also guaranteeing closed-loop stability. We present an online adaptive algorithm implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of both actor and critic neural networks. We call this 'synchronous' policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra nonstandard terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and the stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.