Multiagent reinforcement learning and self-organization in a network of agents

  • Authors:
  • Sherief Abdallah;Victor Lesser

  • Affiliations:
  • British University in Dubai, Dubai, United Arab Emirates;University of Massachusetts, Amherst, MA

  • Venue:
  • Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

To cope with large scale, agents are usually organized in a network such that an agent interacts only with its immediate neighbors in the network. Reinforcement learning techniques have been commonly used to optimize agents local policies in such a network because they require little domain knowledge and can be fully distributed. However, all of the previous work assumed the underlying network was fixed throughout the learning process. This assumption was important because the underlying network defines the learning context of each agent. In particular, the set of actions and the state space for each agent is defined in terms of the agent's neighbors. If agents dynamically change the underlying network structure (also called self-organize) during learning, then one needs a mechanism for transferring what agents have learned so far before (in the old network structure) to their new learning context (in the new network structure). In this work we develop a novel self-organization mechanism that not only allows agents to self-organize the underlying network during the learning process, but also uses information from learning to guide the self-organization process. Consequently, our work is the first to study this interaction between learning and self-organization. Our self-organization mechanism uses heuristics to transfer the learned knowledge across the different steps of self-organization. We also present a more restricted version of our mechanism that is computationally less expensive and still achieve good performance. We use a simplified version of the distributed task allocation domain as our case study. Experimental results verify the stability of our approach and show a monotonic improvement in the performance of the learning process due to self-organization.