Learning Cooperative Behaviours in Multiagent Reinforcement Learning

Authors:
Somnuk Phon-Amnuaisuk
Affiliations:
Perceptions and Simulation of Intelligent Behaviours, Faculty of Information Technology, Multimedia University, Jln Multimedia, Cyberjaya, Malaysia 63100
Venue:
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part I
Year:
2009

Citing 0
Cited 2

Extraction of reward-related feature space using correlation-based and reward-based learning methods

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Learning chasing behaviours of non-player characters in games using SARSA

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigated the coordination among agents in a goal finding task in a partially observable environment. In our problem formulation, the task was to locate a goal in a 2D space. However, no information related to the goal was given to the agents unless they had formed a swarm. Further more, the goal must be located by a swarm of agents, not a single agent. In this study, cooperative behaviours among agents were learned using our proposed context dependent multiagent SARSA algorithms (CDM-SARSA). In essence, instead of tracking the actions from all the agents in the Q-table i.e., Q(s,a), the CDM-SARSA tracked only actions a i of agent i and the context c resulting from the actions of all the agents, i.e., Q i (s,a i ,c). This approach reduced the size of the state space considerably. Tracking all the agents' actions was impractical since the state space increased exponentially with every new agent added into the system. In our opinion, tracking the context abstracted unnecessary details and this approach was a logical solution for multiagent reinforcement learning task. The proposed approach for learning cooperative behaviours was illustrated using a different number of agents and with different grid sizes. The empirical results confirmed that the proposed CDM-SARSA could learn cooperative behaviours successfully.