Efficient behavior learning by utilizing estimated state value of self and teammates

Authors:
Kouki Shimada;Yasutake Takahashi;Minoru Asada
Affiliations:
Dept. of Adaptive Machine Systems, Graduate School of Engineering, Osaka University;Dept. of Adaptive Machine Systems, Graduate School of Engineering, Osaka University;Dept. of Adaptive Machine Systems, Graduate School of Engineering, Osaka University
Venue:
RoboCup 2009
Year:
2010

Citing 6
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Robot Learning

Robot Learning
Coordination Without Negotiation in Teams of Heterogeneous Robots

RoboCup 2006: Robot Soccer World Cup X
Distributed, Play-Based Coordination for Robot Teams in Dynamic Environments

RoboCup 2006: Robot Soccer World Cup X
Cooperative/Competitive Behavior Acquisition Based on State Value Estimation of Others

RoboCup 2007: Robot Soccer World Cup XI
Cooperative action control based on evaluating objective achievements

RoboCup 2005

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning applications to real robots in multi-agent dynamic environments are limited because of huge exploration space and enormously long learning time. One of the typical examples is a case of RoboCup competitions since other agents and their behavior easily cause state and action space explosion. This paper presents a method that utilizes state value functions of macro actions to explore appropriate behavior efficiently in a multi-agent environment by which the learning agent can acquire cooperative behavior with its teammates and competitive ones against its opponents. The key ideas are as follows. First, the agent learns a few macro actions and the state value functions based on reinforcement learning beforehand. Second, an appropriate initial controller for learning cooperative behavior is generated based on the state value functions. The initial controller utilizes the state values of the macro actions so that the learner tends to select a good macro action and not select useless ones. By combination of the ideas and a two-layer hierarchical system, the proposed method shows better performance during the learning than conventional methods. This paper shows a case study of 4 (defense team) on 5 (offense team) game task, and the learning agent (a passer of the offense team) successfully acquired the teamwork plays (pass and shoot) within shorter learning time.