Reinforcement learning for DEC-MDPs with changing action sets and partially ordered dependencies

Authors:
Thomas Gabel;Martin Riedmiller
Affiliations:
University of Osnabrück, Osnabrück, Germany;University of Osnabrück, Osnabrück, Germany
Venue:
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Year:
2008

Citing 6
Cited 2

The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
Optimizing information exchange in cooperative multi-agent systems

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Decentralized Markov Decision Processes with Event-Driven Interactions

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1
Agent interaction in distributed POMDPs and its implications on complexity

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Solving transition independent decentralized Markov decision processes

Journal of Artificial Intelligence Research
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning

Joint Equilibrium Policy Search for Multi-Agent Scheduling Problems

MATES '08 Proceedings of the 6th German conference on Multiagent System Technologies
Evaluation of Batch-Mode Reinforcement Learning Methods for Solving DEC-MDPs with Changing Action Sets

Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decentralized Markov decision processes are frequently used to model cooperative multi-agent systems. In this paper, we identify a subclass of general DEC-MDPs that features regularities in the way agents interact with one another. This class is of high relevance for many real-world applications and features provably reduced complexity (NP-complete) compared to the general problem (NEXP-complete). Since optimally solving larger-sized NP-hard problems is intractable, we keep the learning as much decentralized as possible and use multi-agent reinforcement learning to improve the agents' behavior online. Further, we suggest a restricted message passing scheme that notifies other agents about forthcoming effects on their state transitions and that allows the agents to acquire approximate joint policies of high quality.