Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors

Authors:
Dmitri Dolgov;Edmund Durfee
Affiliations:
Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI;Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI
Venue:
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Year:
2005

Citing 7
Cited 4

Markov decision models with weighted discounted criteria

Mathematics of Operations Research
Hamiltonian cycles and Markov chains

Mathematics of Operations Research
Constrained Markov decision models with weighted discounted rewards

Mathematics of Operations Research
Constrained discounted dynamic programming

Mathematics of Operations Research
Constrained Discounted Markov Decision Processes and Hamiltonian Cycles

Mathematics of Operations Research
Dynamic Programming

Dynamic Programming
Towards a Formalization of Teamwork with Resource Constraints

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2

Commitment-driven distributed joint policy search

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Strong Probabilistic Planning

MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Towards faster planning with continuous resources in stochastic domains

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Probabilistic planning for continuous dynamic systems under bounded risk

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of policy optimization for a resource-limited agent with multiple time-dependent objectives, represented as an MDP with multiple discount factors in the objective function and constraints. We show that limiting search to stationary deterministic policies, coupled with a novel problem reduction to mixed integer programming, yields an algorithm for finding such policies that is computationally feasible, where no such algorithm has heretofore been identified. In the simpler case where the constrained MDP has a single discount factor, our technique provides a new way for finding an optimal deterministic policy, where previous methods could only find randomized policies. We analyze the properties of our approach and describe implementation results.