Brief paper: Policy iteration based feedback control

Authors:
Kan-Jian Zhang;Yan-Kai Xu;Xi Chen;Xi-Ren Cao
Affiliations:
Research Institute of Automation, Southeast University, Nanjing 210096, China;CFINS, Department of Automation, Tsinghua University, Beijing 100084, China;CFINS, Department of Automation, Tsinghua University, Beijing 100084, China;Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Venue:
Automatica (Journal of IFAC)
Year:
2008

Citing 6
Cited 1

Numerical methods for stochastic control problems in continuous time

Numerical methods for stochastic control problems in continuous time
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Discrete Event Dynamic Systems
Technical Communique: A unified approach to Markov decision problems and performance sensitivity analysis

Automatica (Journal of IFAC)

An Approximation Approach for the Deviation Matrix of Continuous-Time Markov Processes with Application to Markov Decision Theory

Operations Research

Quantified Score

Hi-index	22.15

Visualization

Abstract

It is well known that stochastic control systems can be viewed as Markov decision processes (MDPs) with continuous state spaces. In this paper, we propose to apply the policy iteration approach in MDPs to the optimal control problem of stochastic systems. We first provide an optimality equation based on performance potentials and develop a policy iteration procedure. Then we apply policy iteration to the jump linear quadratic problem and obtain the coupled Riccati equations for their optimal solutions. The approach is applicable to linear as well as nonlinear systems and can be implemented on-line on real world systems without identifying all the system structure and parameters.