A multiagent reinforcement learning algorithm by dynamically merging markov decision processes

Authors:
Mohammad Ghavamzadeh;Sridhar Mahadevan
Affiliations:
University of Massachusetts Amherst, Amherst, MA;University of Massachusetts Amherst, Amherst, MA
Venue:
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
Year:
2002

Citing 1
Cited 3

Sequential Optimality and Coordination in Multiagent Systems

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence

Transition-independent decentralized markov decision processes

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Solving transition independent decentralized Markov decision processes

Journal of Artificial Intelligence Research
MDP based active localization for multiple robots

CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

One general strategy for accelerating the learning of cooperative multiagent problems is to reuse good or optimal solutions to the task when each agent is acting alone. In this paper, we formalize this approach as dynamically merging solutions to multiple Markov decision processes (MDPs), each representing an individual agent's solution when acting alone, to obtain solutions to the overall multiagent MDP when all the agents act together. We present a new learning algorithm called MAPLE (MultiAgent Policy LEarning) that uses Q-learning and dynamic merging to efficiently construct global solutions to the overall multiagent problem from solutions to the individual MDPs. We illustrate the efficiency of MAPLE by comparing its performance with standard Q-learning applied to the overall multiagent MDP.