Bounded policy iteration for decentralized POMDPs

  • Authors:
  • Daniel S. Bernstein;Eric A. Hansen;Shlomo Zilberstein

  • Affiliations:
  • Dept. of Computer Science, University of Massachusetts, Amherst, MA;Dept. of CS and Engineering, Mississippi State University, MS;Dept. of Computer Science, University of Massachusetts, Amherst, MA

  • Venue:
  • IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs.