Single sample path-based optimization of Markov chains
Journal of Optimization Theory and Applications - Special issue in honor of Yu-Chi Ho
Limiting Average Criteria For Nonstationary Markov Decision Processes
SIAM Journal on Optimization
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes
Discrete Event Dynamic Systems
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
Discrete Event Dynamic Systems
CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
Probability in the Engineering and Informational Sciences
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
Automatica (Journal of IFAC)
Brief Minimax control for discrete-time time-varying stochastic systems
Automatica (Journal of IFAC)
Basic Ideas for Event-Based Optimization of Markov Systems
Discrete Event Dynamic Systems
STEWARD: demo of spatio-textual extraction on the web aiding the retrieval of documents
dg.o '07 Proceedings of the 8th annual international conference on Digital government research: bridging disciplines & domains
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
Continuous-time Markov decision processes with nth-bias optimality criteria
Automatica (Journal of IFAC)
Bias optimality for multichain continuous-time Markov decision processes
Operations Research Letters
Hi-index | 22.15 |
We propose a unified framework to Markov decision problems and performance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-difference formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted- and average-reward MDPs can be established using the performance-difference formulas in a simple and intuitive way; and the performance-gradient formulas together with stochastic approximation may lead to new optimization schemes. This sensitivity-based point of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36 (2000) 771).