Exploiting structure in decentralized markov decision processes

Authors:
Victor Lesser;Shlomo Zilberstein;Raphen Becker
Affiliations:
University of Massachusetts Amherst;University of Massachusetts Amherst;University of Massachusetts Amherst
Venue:
Exploiting structure in decentralized markov decision processes
Year:
2006

Citing 0
Cited 3

Constraint-based dynamic programming for decentralized POMDPs with structured interactions

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
A bilinear programming approach for multiagent planning

Journal of Artificial Intelligence Research
Event-detecting multi-agent MDPs: complexity and constant-factor approximation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

While formal, decision-theoretic models such as the Markov Decision Process (MDP) have greatly advanced the field of single-agent control, application of similar ideas to multi-agent domains has proven problematic. The advantages of such an approach over traditional heuristic and experimental models of multi-agent systems include a more accurate representation of the underlying problem, a more easily defined notion of optimality and the potential for significantly better solutions. The difficulty often comes from the tradeoff between the expressiveness of the model and the complexity of finding an optimal solution. Much of the research in this area has focused on the extremes of this tradeoff. At one extreme are models where each agent has a global view of the world, and solving these problems is no harder than solving single-agent problems. At the other extreme lie very general, decentralized models, which are also nearly impossible to solve optimally. The work proposed here explores the middle-ground by starting with a general decentralized Markov decision process and introducing structure that can be exploited to reduce the complexity. I present two decision-theoretic models that structure the interactions between agents in two different ways. In the first model the agents are independent except for an extra reward signal that depends on each of the agents' histories. In the second model the agents have independent rewards but there is a structured interaction between their transition probabilities. Both of these models can be optimally and approximately solved using my Coverage Set Algorithm. I also extend the first model by allowing the agents to communicate and I introduce an algorithm that finds an optimal joint communication policy for a fixed joint domain-level policy.