Influence diagrams with memory states: representation and algorithms

Authors:
Xiaojian Wu;Akshat Kumar;Shlomo Zilberstein
Affiliations:
Computer Science Department, University of Massachusetts, Amherst, MA;Computer Science Department, University of Massachusetts, Amherst, MA;Computer Science Department, University of Massachusetts, Amherst, MA
Venue:
ADT'11 Proceedings of the Second international conference on Algorithmic decision theory
Year:
2011

Citing 7
Cited 1

Evaluating influence diagrams

Operations Research
Probabilistic inference and influence diagrams

Operations Research
An improved policy iteration algorithm for partially observable MDPs

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Representing and Solving Decision Problems with Limited Information

Management Science
Probabilistic inference for solving discrete and continuous state Markov Decision Processes

ICML '06 Proceedings of the 23rd international conference on Machine learning
Policy iteration for decentralized control of Markov decision processes

Journal of Artificial Intelligence Research
Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Autonomous Agents and Multi-Agent Systems

On the complexity of solving polytree-shaped limited memory influence diagrams with binary variables

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Influence diagrams (IDs) offer a powerful framework for decision making under uncertainty, but their applicability has been hindered by the exponential growth of runtime and memory usage--largely due to the no-forgetting assumption. We present a novel way to maintain a limited amount of memory to inform each decision and still obtain near-optimal policies. The approach is based on augmenting the graphical model with memory states that represent key aspects of previous observations--a method that has proved useful in POMDP solvers. We also derive an efficient EM-based message-passing algorithm to compute the policy. Experimental results show that this approach produces highquality approximate polices and offers better scalability than existing methods.