Multichain Markov decision processes with a sample path constraint: a decomposition approach
Mathematics of Operations Research
Partially observable Markov decision processes for spoken dialog systems
Computer Speech and Language
Piecewise linear dynamic programming for constrained POMDPs
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Value-function approximations for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs
Journal of Artificial Intelligence Research
A heuristic search approach to planning with continuous resources in stochastic domains
Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Constrained Markovian decision processes: the dynamic programming approach
Operations Research Letters
Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework
IEEE Journal on Selected Areas in Communications
Hi-index | 0.00 |
Constrained partially observable Markov decision processes (CPOMDPs) extend the standard POMDPs by allowing the specification of constraints on some aspects of the policy in addition to the optimality objective for the value function. CPOMDPs have many practical advantages over standard POMDPs since they naturally model problems involving limited resource or multiple objectives. In this paper, we show that the optimal policies in CPOMDPs can be randomized, and present exact and approximate dynamic programming methods for computing randomized optimal policies. While the exact method requires solving a minimax quadratically constrained program (QCP) in each dynamic programming update, the approximate method utilizes the point-based value update with a linear program (LP). We show that the randomized policies are significantly better than the deterministic ones. We also demonstrate that the approximate point-based method is scalable to solve large problems.