Generalized and bounded policy iteration for finitely-nested interactive POMDPs: scaling up

Authors:
Ekhlas Sonu;Prashant Doshi
Affiliations:
University of Georgia, Athens, GA;University of Georgia, Athens, GA
Venue:
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Year:
2012

Citing 13
Cited 1

The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
Exact solutions of interactive POMDPs using behavioral equivalence

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Generalized point based value iteration for interactive POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
A framework for sequential planning in multi-agent settings

Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
Monte Carlo sampling methods for approximating interactive POMDPs

Journal of Artificial Intelligence Research
A Trust-Based Multiagent System

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 03
Bounded policy iteration for decentralized POMDPs

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Modeling recursive reasoning by humans using empirically informed interactive POMDPs

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Using iterated reasoning to predict opponent strategies

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Modeling bounded rationality of agents during interactions

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Bimodal switching for online planning in multiagent settings

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Policy iteration algorithms for partially observable Markov decision processes (POMDP) offer the benefits of quick convergence and the ability to operate directly on the solution, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration. In this paper, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform policy iteration in settings formalized by the interactive POMDP framework. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its generalization here makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling others, we ascribe nested controllers to predict others' actions, with the benefit that the controllers compactly represent the model space. We evaluate our approach on multiple problem domains, and demonstrate its properties and scalability.