A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems

Authors:
S. Bhasin;R. Kamalapurkar;M. Johnson;K. G. Vamvoudakis;F. L. Lewis;W. E. Dixon
Affiliations:
Department of Electrical Engineering, Indian Institute of Technology, Delhi, India;Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA;Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA;Center for Control, Dynamical Systems, and Computation (CCDC), University of California Santa Barbara, CA 93106-9560, USA;Automation and Robotics Research Institute, The University of Texas at Arlington, 7300 Jack Newell Blvd. S., Ft. Worth, TX 76118, USA;Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA
Venue:
Automatica (Journal of IFAC)
Year:
2013

Citing 19
Cited 2

Adaptive control: stability, convergence, and robustness

Adaptive control: stability, convergence, and robustness
Multilayer feedforward networks are universal approximators

Neural Networks
A menu of designs for reinforcement learning over time

Neural networks for control
Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation

Automatica (Journal of IFAC)
Nonlinear and Adaptive Control Design

Nonlinear and Adaptive Control Design
Neuro-Fuzzy Control of Industrial Systems with Actuator Nonlinearities

Neuro-Fuzzy Control of Industrial Systems with Actuator Nonlinearities
Nonlinear Control of Engineering Systems: A Lyapunov-Based Approach

Nonlinear Control of Engineering Systems: A Lyapunov-Based Approach
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence)

Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence)
Reinforcement Learning in Continuous Time and Space

Neural Computation
A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems

Neural Networks
2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems

Neural Networks
2009 Special Issue: Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence

Neural Networks
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem

Automatica (Journal of IFAC)
Reinforcement Learning and Dynamic Programming Using Function Approximators

Reinforcement Learning and Dynamic Programming Using Function Approximators
Reinforcement Learning Neural-Network-Based Controller for Nonlinear Discrete-Time Systems With Input Constraints

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach

Automatica (Journal of IFAC)
Adaptive critic designs

IEEE Transactions on Neural Networks

Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems

Automatica (Journal of IFAC)
On integral generalized policy iteration for continuous-time linear quadratic regulations

Automatica (Journal of IFAC)

Quantified Score

Hi-index	22.15

Visualization

Abstract

An online adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem for continuous-time uncertain nonlinear systems. A novel actor-critic-identifier (ACI) is proposed to approximate the Hamilton-Jacobi-Bellman equation using three neural network (NN) structures-actor and critic NNs approximate the optimal control and the optimal value function, respectively, and a robust dynamic neural network identifier asymptotically approximates the uncertain system dynamics. An advantage of using the ACI architecture is that learning by the actor, critic, and identifier is continuous and simultaneous, without requiring knowledge of system drift dynamics. Convergence of the algorithm is analyzed using Lyapunov-based adaptive control methods. A persistence of excitation condition is required to guarantee exponential convergence to a bounded region in the neighborhood of the optimal control and uniformly ultimately bounded (UUB) stability of the closed-loop system. Simulation results demonstrate the performance of the actor-critic-identifier method for approximate optimal control.