Technical Note: \cal Q-Learning
Machine Learning
Constrained discounted dynamic programming
Mathematics of Operations Research
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Multi-criteria Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Planning under uncertainty in complex structured environments
Planning under uncertainty in complex structured environments
Learning to Communicate and Act Using Hierarchical Reinforcement Learning
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Approximating optimal policies for agents with limited execution resources
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
The complexity of plan existence and evaluation in robabilistic domains
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
In multiple criteria Markov Decision Processes (MDP) where multiple costs are incurred at every decision point, current methods solve them by minimising the expected primary cost criterion while constraining the expectations of other cost criteria to some critical values. However, systems are often faced with hard constraints where the cost criteria should never exceed some critical values at any time, rather than constraints based on the expected cost criteria. For example, a resource-limited sensor network no longer functions once its energy is depleted. Based on the semi-MDP (sMDP) model, we study the hard constrained (HC) problem in continuous time, state and action spaces with respect to both finite and infinite horizons, and various cost criteria. We show that the HCsMDP problem is NP-hard and that there exists an equivalent discrete-time MDP to every HCsMDP. Hence, classical methods such as reinforcement learning can solve HCsMDPs.