Continuous-Time Adaptive Critics

Authors:
T. Hanselmann;L. Noakes;A. Zaknich
Affiliations:
Dept. of Electr. & Electron. Eng., Melbourne Univ., Parkville, Vic.;-;-
Venue:
IEEE Transactions on Neural Networks
Year:
2007

Citing 0
Cited 9

2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems

Neural Networks
Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints

IEEE Transactions on Neural Networks
Reinforcement learning and adaptive dynamic programming for feedback control

IEEE Circuits and Systems Magazine
Asymptotically stable adaptive critic design for uncertain nonlinear systems

ACC'09 Proceedings of the 2009 conference on American Control Conference
Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Generalized policy iteration for continuous-time systems

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Adaptive dynamic programming: an introduction

IEEE Computational Intelligence Magazine
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem

Automatica (Journal of IFAC)
On integral generalized policy iteration for continuous-time linear quadratic regulations

Automatica (Journal of IFAC)

Quantified Score

Hi-index	0.01

Visualization

Abstract

A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume