Journal of Optimization Theory and Applications
Sensitivity of constrained Markov decision processes
Annals of Operations Research
Reinforcement learning algorithms for average-payoff Markovian decision processes
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Single sample path-based optimization of Markov chains
Journal of Optimization Theory and Applications - Special issue in honor of Yu-Chi Ho
Actor-Critic--Type Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Learning Algorithms for Markov Decision Processes with Average Cost
SIAM Journal on Control and Optimization
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes
Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Practical Reinforcement Learning in Continuous Spaces
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
SIAM Journal on Control and Optimization
Introduction to Discrete Event Systems
Introduction to Discrete Event Systems
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation
Journal of Artificial Intelligence Research
Brief paper: Average cost temporal-difference learning
Automatica (Journal of IFAC)
Automatica (Journal of IFAC)
A time aggregation approach to Markov decision processes
Automatica (Journal of IFAC)
An Algorithmic Approach for Sensitivity Analysis of Perturbed Quasi-Birth-and-Death Processes
Queueing Systems: Theory and Applications
Brief paper: Policy iteration based feedback control
Automatica (Journal of IFAC)
Error bounds of optimization algorithms for semi-Markov decision processes
International Journal of Systems Science
Proceedings of the 25th international conference on Machine learning
Application of RBF Neural Network to Simplify the Potential Based Optimization
ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks, Part III
Max-min optimality of service rates in queueing systems with customer-average performance criterion
Proceedings of the 40th Conference on Winter Simulation
Continuous-time Markov decision processes with nth-bias optimality criteria
Automatica (Journal of IFAC)
Stochastic control via direct comparison
Discrete Event Dynamic Systems
Hi-index | 0.01 |
The goals of perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learning (RL) are common: to make decisions to improve the system performance based on the information obtained by analyzing the current system behavior. In this paper, we study the relations among these closely related fields. We show that MDP solutions can be derived naturally from performance sensitivity analysis provided by PA. Performance potential plays an important role in both PA and MDPs; it also offers a clear intuitive interpretation for many results. Reinforcement learning, TD(λ), neuro-dynamic programming, etc., are efficient ways of estimating the performance potentials and related quantities based on sample paths. The sensitivity point of view of PA, MDP, and RL brings in some new insight to the area of learning and optimization. In particular, gradient-based optimization can be applied to parameterized systems with large state spaces, and gradient-based policy iteration can be applied to some nonstandard MDPs such as systems with correlated actions, etc. Potential-based on-line approaches and their advantages are also discussed.