Efficient reward functions for adaptive multi-rover systems

  • Authors:
  • Kagan Tumer;Adrian Agogino

  • Affiliations:
  • NASA Ames Research Center, Moffet Field, CA;UC Santa Cruz, NASA Ames Research Center, Moffet Field, CA

  • Venue:
  • LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This chapter focuses on deriving reward functions that allow multiple agents to co-evolve efficient control policies that maximize a system level reward in noisy and dynamic environments. The solution we present is based on agent rewards satisfying two crucial properties. First, the agent reward function and global reward function has to be aligned, that is, an agent maximizing its agent-specific reward should also maximize the global reward. Second, the agent has to receive sufficient “signal” from its reward, that is, an agent's action should have a large influence over its agent-specific reward. Agents using rewards with these two properties will evolve the correct policies quickly. This hypothesis is tested in episodic and non-episodic, continuous-space multi-rover environment where rovers evolve to maximize a global reward function over all rovers. The environments are dynamic (i.e. changes over time), noisy and have restriction on communication between agents. We show that a control policy evolved using agent-specific rewards satisfying the above properties outperforms policies evolved using global rewards by up to 400%. More notably, in the presence of a larger number of rovers or rovers with noisy and communication limited sensors, the proposed method outperforms global reward by a higher percentage than in noise-free conditions with a small number of rovers.