Reinforcement learning and adaptive dynamic programming for feedback control
IEEE Circuits and Systems Magazine
Brief paper: Optimality and convergence of adaptive optimal control by reinforcement synthesis
Automatica (Journal of IFAC)
IEEE Transactions on Neural Networks
Hi-index | 22.14 |
To synthesize the optimal control strategies of nonlinear systems on infinite horizon while subject to mixed equality and inequality constraints has been a challenge to control engineers. This paper regards it as a problem of finite-time optimization in infinite-horizon control then devises a reinforcement learning agent, termed as the Adaptive Optimal Control (AOC) agent, to carry out the finite-time optimization procedures. Adaptive optimal control is in the sense of activating the finite-time optimization procedure whenever needed to improve the control strategy or adapt to a real-world environment. The Nonlinear Quadratic Regulator (NQR) is shown a typical example that the AOC agent can find out. The optimality conditions and adaptation rules for the AOC agent are deduced from Pontryagin's minimum principle. The requirements for convergence and stability of the AOC system are shown.