On integral generalized policy iteration for continuous-time linear quadratic regulations

Authors:
Jae Young Lee;Jin Bae Park;Yoon Ho Choi
Affiliations:
-;-;-
Venue:
Automatica (Journal of IFAC)
Year:
2014

Citing 11
Cited 0

Reinforcement Learning in Continuous Time and Space

Neural Computation
Discrete-time control algorithms and adaptive intelligent systems designs

Discrete-time control algorithms and adaptive intelligent systems designs
Inexact Kleinman-Newton Method for Riccati Equations

SIAM Journal on Matrix Analysis and Applications
Reinforcement learning and adaptive dynamic programming for feedback control

IEEE Circuits and Systems Magazine
Generalized policy iteration for continuous-time systems

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem

Automatica (Journal of IFAC)
Adaptive critic designs

IEEE Transactions on Neural Networks
Continuous-Time Adaptive Critics

IEEE Transactions on Neural Networks
Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming

Automatica (Journal of IFAC)
Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

Automatica (Journal of IFAC)
A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems

Automatica (Journal of IFAC)

Quantified Score

Hi-index	22.14

Visualization

Abstract

This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon @?, and then show that (i) all of the I-GPI methods with the same @? can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as @?-~. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit @?-~. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (@?-0). From these results, a new classification of the integral reinforcement learning is formed with respect to @?. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.