Exploiting structure in decentralized markov decision processes

  • Authors:
  • Victor Lesser;Shlomo Zilberstein;Raphen Becker

  • Affiliations:
  • University of Massachusetts Amherst;University of Massachusetts Amherst;University of Massachusetts Amherst

  • Venue:
  • Exploiting structure in decentralized markov decision processes
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

While formal, decision-theoretic models such as the Markov Decision Process (MDP) have greatly advanced the field of single-agent control, application of similar ideas to multi-agent domains has proven problematic. The advantages of such an approach over traditional heuristic and experimental models of multi-agent systems include a more accurate representation of the underlying problem, a more easily defined notion of optimality and the potential for significantly better solutions. The difficulty often comes from the tradeoff between the expressiveness of the model and the complexity of finding an optimal solution. Much of the research in this area has focused on the extremes of this tradeoff. At one extreme are models where each agent has a global view of the world, and solving these problems is no harder than solving single-agent problems. At the other extreme lie very general, decentralized models, which are also nearly impossible to solve optimally. The work proposed here explores the middle-ground by starting with a general decentralized Markov decision process and introducing structure that can be exploited to reduce the complexity. I present two decision-theoretic models that structure the interactions between agents in two different ways. In the first model the agents are independent except for an extra reward signal that depends on each of the agents' histories. In the second model the agents have independent rewards but there is a structured interaction between their transition probabilities. Both of these models can be optimally and approximately solved using my Coverage Set Algorithm. I also extend the first model by allowing the agents to communicate and I introduce an algorithm that finds an optimal joint communication policy for a fixed joint domain-level policy.