Generalized policy iteration for continuous-time systems

Authors:
Draguna Vrabie;Frank L. Lewis
Affiliations:
Automation and Robotics Research Institute, University of Texas at Arlington, Fort Worth, TX;Automation and Robotics Research Institute, University of Texas at Arlington, Fort Worth, TX
Venue:
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Year:
2009

Citing 14
Cited 2

Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks

Neural Networks
Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation

Automatica (Journal of IFAC)
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
On the convergence of optimistic policy iteration

The Journal of Machine Learning Research
Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence)

Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence)
Reinforcement Learning in Continuous Time and Space

Neural Computation
Brief paper: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control

Automatica (Journal of IFAC)
Brief paper: Adaptive optimal control for continuous-time linear systems based on policy iteration

Automatica (Journal of IFAC)
Adaptive dynamic programming

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to Control

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach

Automatica (Journal of IFAC)
Adaptive critic designs

IEEE Transactions on Neural Networks
Continuous-Time Adaptive Critics

IEEE Transactions on Neural Networks

2012 Special Issue: A boundedness result for the direct heuristic dynamic programming

Neural Networks
On integral generalized policy iteration for continuous-time linear quadratic regulations

Automatica (Journal of IFAC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a unified point of view over the Approximate Dynamic Programming (ADP) algorithms which have been developed in the last years for continuous-time (CT) systems. We introduce here, in a continuous-time formulation, the Generalized Policy Iteration (GPI), and show that in effect it represents a spectrum of algorithms which has at one end the exact Policy Iteration (PI) algorithm and at the other the Value Iteration (VI) algorithm. At the middle part of the spectrum we formulate for the first time the Optimistic Policy Iteration (OPI) algorithm for CT systems. We introduce the GPI starting from a new formulation for the PI algorithm which involves an iterative process to solve for the value function at the policy evaluation step. The GPI algorithm is implemented on an Actor/Critic structure. The results allow implementation of a family of adaptive controllers which converge online to the solution of the optimal control problem, without knowing or identifying the internal dynamics of the system. Simulation results are provided to verify the convergence to the optimal control solution.