Single sample path-based optimization of Markov chains
Journal of Optimization Theory and Applications - Special issue in honor of Yu-Chi Ho
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Neuro-Dynamic Programming
The Relations Among Potentials, Perturbation Analysis,and Markov Decision Processes
Discrete Event Dynamic Systems
Hierarchical control and learning for markov decision processes
Hierarchical control and learning for markov decision processes
Brief paper: Average cost temporal-difference learning
Automatica (Journal of IFAC)
Automatica (Journal of IFAC)
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
Discrete Event Dynamic Systems
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
Basic Ideas for Event-Based Optimization of Markov Systems
Discrete Event Dynamic Systems
Markov decision process applied to the control of hospital elective admissions
Artificial Intelligence in Medicine
The control of a two-level Markov decision process by time aggregation
Automatica (Journal of IFAC)
Time aggregated Markov decision processes via standard dynamic programming
Operations Research Letters
Hi-index | 22.15 |
We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation suffers no loss of accuracy, because the Markov property is preserved. Single sample path-based estimation algorithms are developed that allow the time aggregation approach to be implemented on-line for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings.