Point-based value iteration for constrained POMDPs

Authors:
Dongho Kim;Jaesong Lee;Kee-Eung Kim;Pascal Poupart
Affiliations:
Department of Computer Science, KAIST, Korea;Department of Computer Science, KAIST, Korea;Department of Computer Science, KAIST, Korea;School of Computer Science, University of Waterloo, Canada
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 10
Cited 0

Multichain Markov decision processes with a sample path constraint: a decomposition approach

Mathematics of Operations Research
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
Piecewise linear dynamic programming for constrained POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
A heuristic search approach to planning with continuous resources in stochastic domains

Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Constrained Markovian decision processes: the dynamic programming approach

Operations Research Letters
Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework

IEEE Journal on Selected Areas in Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Constrained partially observable Markov decision processes (CPOMDPs) extend the standard POMDPs by allowing the specification of constraints on some aspects of the policy in addition to the optimality objective for the value function. CPOMDPs have many practical advantages over standard POMDPs since they naturally model problems involving limited resource or multiple objectives. In this paper, we show that the optimal policies in CPOMDPs can be randomized, and present exact and approximate dynamic programming methods for computing randomized optimal policies. While the exact method requires solving a minimax quadratically constrained program (QCP) in each dynamic programming update, the approximate method utilizes the point-based value update with a linear program (LP). We show that the randomized policies are significantly better than the deterministic ones. We also demonstrate that the approximate point-based method is scalable to solve large problems.