Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

Authors:
Jae Young Lee;Jin Bae Park;Yoon Ho Choi
Affiliations:
Department of Electrical and Electronic Engineering, Yonsei University, 5o Yonsei-ro, Seodaemun-gu, Seoul, Republic of Korea;Department of Electrical and Electronic Engineering, Yonsei University, 5o Yonsei-ro, Seodaemun-gu, Seoul, Republic of Korea;Department of Electronic Engineering, Kyonggi University, 94-6 Yiui-dong, Yeongtong-gu, Suwon, Kyonggi-Do, Republic of Korea
Venue:
Automatica (Journal of IFAC)
Year:
2012

Citing 10
Cited 1

Singular Perturbation Methods in Control: Analysis and Design

Singular Perturbation Methods in Control: Analysis and Design
Reinforcement Learning in Continuous Time and Space

Neural Computation
Brief paper: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control

Automatica (Journal of IFAC)
Brief paper: Adaptive optimal control for continuous-time linear systems based on policy iteration

Automatica (Journal of IFAC)
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Reinforcement learning and adaptive dynamic programming for feedback control

IEEE Circuits and Systems Magazine
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem

Automatica (Journal of IFAC)
Issues on Stability of ADP Feedback Controllers for Dynamical Systems

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Adaptive critic designs

IEEE Transactions on Neural Networks

On integral generalized policy iteration for continuous-time linear quadratic regulations

Automatica (Journal of IFAC)

Quantified Score

Hi-index	22.15

Visualization

Abstract

This paper proposes an integral Q-learning for continuous-time (CT) linear time-invariant (LTI) systems, which solves a linear quadratic regulation (LQR) problem in real time for a given system and a value function, without knowledge about the system dynamics A and B. Here, Q-learning is referred to as a family of reinforcement learning methods which find the optimal policy by interaction with an uncertain environment. In the evolution of the algorithm, we first develop an explorized policy iteration (PI) method which is able to deal with known exploration signals. Then, the integral Q-learning algorithm for CT LTI systems is derived based on this PI and the variants of Q-functions derived from the singular perturbation of the control input. The proposed Q-learning scheme evaluates the current value function and the improved control policy at the same time, and are proven stable and convergent to the LQ optimal solution, provided that the initial policy is stabilizing. For the proposed algorithms, practical online implementation methods are investigated in terms of persistency of excitation (PE) and explorations. Finally, simulation results are provided for the better comparison and verification of the performance.