Reinforcement learning for DEC-MDPs with changing action sets and partially ordered dependencies

  • Authors:
  • Thomas Gabel;Martin Riedmiller

  • Affiliations:
  • University of Osnabrück, Osnabrück, Germany;University of Osnabrück, Osnabrück, Germany

  • Venue:
  • Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Decentralized Markov decision processes are frequently used to model cooperative multi-agent systems. In this paper, we identify a subclass of general DEC-MDPs that features regularities in the way agents interact with one another. This class is of high relevance for many real-world applications and features provably reduced complexity (NP-complete) compared to the general problem (NEXP-complete). Since optimally solving larger-sized NP-hard problems is intractable, we keep the learning as much decentralized as possible and use multi-agent reinforcement learning to improve the agents' behavior online. Further, we suggest a restricted message passing scheme that notifies other agents about forthcoming effects on their state transitions and that allows the agents to acquire approximate joint policies of high quality.