Stochastic control via direct comparison

Authors:
Xi-Ren Cao;De-Xin Wang;Tao Lu;Yifan Xu
Affiliations:
Shanghai Jiaotong University, Shanghai, China;Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Fudan University, Shanghai, China
Venue:
Discrete Event Dynamic Systems
Year:
2011

Citing 12
Cited 0

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Discrete Event Dynamic Systems
Improved Dynamic Programming Methods for Optimal Control of Lumped-Parameter Stochastic Systems

Operations Research
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
Stochastic Learning and Optimization: A Sensitivity-Based Approach (International Series on Discrete Event Dynamic Systems)

Stochastic Learning and Optimization: A Sensitivity-Based Approach (International Series on Discrete Event Dynamic Systems)
Introduction to Discrete Event Systems

Introduction to Discrete Event Systems
A Numerical Method for Solving Singular Stochastic Control Problems

Operations Research
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Markov Chains and Stochastic Stability

Markov Chains and Stochastic Stability
Policy iteration for customer-average performance optimization of closed queueing systems

Automatica (Journal of IFAC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The standard approach to stochastic control is dynamic programming. In this paper, we introduce an alternative approach based on direct comparison of the performance of any two policies. This is achieved by modeling the state process as a continuous-time and continuous-state Markov process and applying the same ideas as for the discrete-time and discrete-state case. This approach is simple and intuitively clear; it applies to different problems with, finite and infinite horizons, discounted and long-run-average performance, continuous and jump diffusions, in the same way. Discounting is not needed when dealing with long-run average performance. The approach provides a unified framework for stochastic control and other optimization theory and methodologies, including Markov decision processes, perturbation analysis, and reinforcement learning.