The Complexity of Decentralized Control of Markov Decision Processes
Mathematics of Operations Research
Exact solutions of interactive POMDPs using behavioral equivalence
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Generalized point based value iteration for interactive POMDPs
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
A framework for sequential planning in multi-agent settings
Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs
Journal of Artificial Intelligence Research
Monte Carlo sampling methods for approximating interactive POMDPs
Journal of Artificial Intelligence Research
A Trust-Based Multiagent System
CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 03
Bounded policy iteration for decentralized POMDPs
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Modeling recursive reasoning by humans using empirically informed interactive POMDPs
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Using iterated reasoning to predict opponent strategies
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Modeling bounded rationality of agents during interactions
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Solving POMDPs by searching in policy space
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Bimodal switching for online planning in multiagent settings
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Policy iteration algorithms for partially observable Markov decision processes (POMDP) offer the benefits of quick convergence and the ability to operate directly on the solution, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration. In this paper, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform policy iteration in settings formalized by the interactive POMDP framework. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its generalization here makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling others, we ascribe nested controllers to predict others' actions, with the benefit that the controllers compactly represent the model space. We evaluate our approach on multiple problem domains, and demonstrate its properties and scalability.