From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Authors:
Xi-Ren Cao
Affiliations:
Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Venue:
Discrete Event Dynamic Systems
Year:
2003

Citing 21
Cited 9

Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation

Journal of Optimization Theory and Applications
Sensitivity of constrained Markov decision processes

Annals of Operations Research
Reinforcement learning algorithms for average-payoff Markovian decision processes

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Single sample path-based optimization of Markov chains

Journal of Optimization Theory and Applications - Special issue in honor of Yu-Chi Ho
Actor-Critic--Type Learning Algorithms for Markov Decision Processes

SIAM Journal on Control and Optimization
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning Algorithms for Markov Decision Processes with Average Cost

SIAM Journal on Control and Optimization
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes

Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Introduction to Discrete Event Systems

Introduction to Discrete Event Systems
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research
Brief paper: Average cost temporal-difference learning

Automatica (Journal of IFAC)
Technical Communique: A unified approach to Markov decision problems and performance sensitivity analysis

Automatica (Journal of IFAC)
A time aggregation approach to Markov decision processes

Automatica (Journal of IFAC)

An Algorithmic Approach for Sensitivity Analysis of Perturbed Quasi-Birth-and-Death Processes

Queueing Systems: Theory and Applications
Brief paper: Policy iteration based feedback control

Automatica (Journal of IFAC)
Error bounds of optimization algorithms for semi-Markov decision processes

International Journal of Systems Science
Active reinforcement learning

Proceedings of the 25th international conference on Machine learning
Application of RBF Neural Network to Simplify the Potential Based Optimization

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks, Part III
Max-min optimality of service rates in queueing systems with customer-average performance criterion

Proceedings of the 40th Conference on Winter Simulation
Continuous-time Markov decision processes with nth-bias optimality criteria

Automatica (Journal of IFAC)
Stochastic control via direct comparison

Discrete Event Dynamic Systems
A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases

Automatica (Journal of IFAC)

Quantified Score

Hi-index	0.01

Visualization

Abstract

The goals of perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learning (RL) are common: to make decisions to improve the system performance based on the information obtained by analyzing the current system behavior. In this paper, we study the relations among these closely related fields. We show that MDP solutions can be derived naturally from performance sensitivity analysis provided by PA. Performance potential plays an important role in both PA and MDPs; it also offers a clear intuitive interpretation for many results. Reinforcement learning, TD(λ), neuro-dynamic programming, etc., are efficient ways of estimating the performance potentials and related quantities based on sample paths. The sensitivity point of view of PA, MDP, and RL brings in some new insight to the area of learning and optimization. In particular, gradient-based optimization can be applied to parameterized systems with large state spaces, and gradient-based policy iteration can be applied to some nonstandard MDPs such as systems with correlated actions, etc. Potential-based on-line approaches and their advantages are also discussed.