Efficient behavior learning by utilizing estimated state value of self and teammates

  • Authors:
  • Kouki Shimada;Yasutake Takahashi;Minoru Asada

  • Affiliations:
  • Dept. of Adaptive Machine Systems, Graduate School of Engineering, Osaka University;Dept. of Adaptive Machine Systems, Graduate School of Engineering, Osaka University;Dept. of Adaptive Machine Systems, Graduate School of Engineering, Osaka University

  • Venue:
  • RoboCup 2009
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement learning applications to real robots in multi-agent dynamic environments are limited because of huge exploration space and enormously long learning time. One of the typical examples is a case of RoboCup competitions since other agents and their behavior easily cause state and action space explosion. This paper presents a method that utilizes state value functions of macro actions to explore appropriate behavior efficiently in a multi-agent environment by which the learning agent can acquire cooperative behavior with its teammates and competitive ones against its opponents. The key ideas are as follows. First, the agent learns a few macro actions and the state value functions based on reinforcement learning beforehand. Second, an appropriate initial controller for learning cooperative behavior is generated based on the state value functions. The initial controller utilizes the state values of the macro actions so that the learner tends to select a good macro action and not select useless ones. By combination of the ideas and a two-layer hierarchical system, the proposed method shows better performance during the learning than conventional methods. This paper shows a case study of 4 (defense team) on 5 (offense team) game task, and the learning agent (a passer of the offense team) successfully acquired the teamwork plays (pass and shoot) within shorter learning time.