Brief paper: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control

Authors:
Asma Al-Tamimi;Frank L. Lewis;Murad Abu-Khalaf
Affiliations:
Automation and Robotics Research Institute, The University of Texas at Arlington, Texas 76118, USA;Automation and Robotics Research Institute, The University of Texas at Arlington, Texas 76118, USA;Automation and Robotics Research Institute, The University of Texas at Arlington, Texas 76118, USA
Venue:
Automatica (Journal of IFAC)
Year:
2007

Citing 4
Cited 17

A menu of designs for reinforcement learning over time

Neural networks for control
Reinforcement learning-based output feedback control of nonlinear systems with input constraints

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Adaptive critic designs

IEEE Transactions on Neural Networks
Online learning control by association and reinforcement

IEEE Transactions on Neural Networks

Brief paper: Adaptive optimal control for continuous-time linear systems based on policy iteration

Automatica (Journal of IFAC)
Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions

Neurocomputing
Reinforcement learning and adaptive dynamic programming for feedback control

IEEE Circuits and Systems Magazine
Generalized policy iteration for continuous-time systems

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Direct heuristic dynamic programming for nonlinear tracking control with filtered tracking error

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Adaptive dynamic programming: an introduction

IEEE Computational Intelligence Magazine
Brief paper: Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI

Automatica (Journal of IFAC)
Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming

Neurocomputing
Brief paper: An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games

Automatica (Journal of IFAC)
Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses

Automatica (Journal of IFAC)
Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence

Neurocomputing
Self-learning control schemes for two-person zero-sum differential games of continuous-time nonlinear systems with saturating controllers

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics

Automatica (Journal of IFAC)
Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

Automatica (Journal of IFAC)
From model-based control to data-driven control: Survey, classification and perspective

Information Sciences: an International Journal
Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm

Neurocomputing
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	22.16

Visualization

Abstract

In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices. The idea is to solve for an action dependent value function Q(x,u,w) of the zero-sum game instead of solving for the state dependent value function V(x) which satisfies a corresponding game algebraic Riccati equation (GARE). Since the state and actions spaces are continuous, two action networks and one critic network are used that are adaptively tuned in forward time using adaptive critic methods. The result is a Q-learning approximate dynamic programming (ADP) model-free approach that solves the zero-sum game forward in time. It is shown that the critic converges to the game value function and the action networks converge to the Nash equilibrium of the game. Proofs of convergence of the algorithm are shown. It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the GARE of the linear quadratic discrete-time zero-sum game. The effectiveness of this method is shown by performing an H-infinity control autopilot design for an F-16 aircraft.