Technical Note: \cal Q-Learning
Machine Learning
Single sample path-based optimization of Markov chains
Journal of Optimization Theory and Applications - Special issue in honor of Yu-Chi Ho
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes
Discrete Event Dynamic Systems
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
Probability in the Engineering and Informational Sciences
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation
Journal of Artificial Intelligence Research
Learning finite-state controllers for partially observable environments
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Automatica (Journal of IFAC)
A time aggregation approach to Markov decision processes
Automatica (Journal of IFAC)
Policy iteration for customer-average performance optimization of closed queueing systems
Automatica (Journal of IFAC)
Using Markov chain analysis to study dynamic behaviour in large-scale grid systems
AusGrid '09 Proceedings of the Seventh Australasian Symposium on Grid Computing and e-Research - Volume 99
Admission control with elastic QoS for video on demand systems
International Journal of Automation and Computing
Hi-index | 0.02 |
The goal of this paper is two-fold: First, we present a sensitivity point of view on the optimization of Markov systems. We show that Markov decision processes (MDPs) and the policy-gradient approach, or perturbation analysis (PA), can be derived easily from two fundamental sensitivity formulas, and such formulas can be flexibly constructed, by first principles, with performance potentials as building blocks. Second, with this sensitivity view we propose an event-based optimization approach, including the event-based sensitivity analysis and event-based policy iteration. This approach utilizes the special feature of a system characterized by events and illustrates how the potentials can be aggregated using the special feature and how the aggregated potential can be used in policy iteration. Compared with the traditional MDP approach, the event-based approach has its advantages: the number of aggregated potentials may scale to the system size despite that the number of states grows exponentially in the system size, this reduces the policy space and saves computation; the approach does not require actions at different states to be independent; and it utilizes the special feature of a system and does not need to know the exact transition probability matrix. The main ideas of the approach are illustrated by an admission control problem.