Evaluation of the improved penalty avoiding rational policy making algorithm in real world environment

  • Authors:
  • Kazuteru Miyazaki;Masaki Itou;Hiroaki Kobayashi

  • Affiliations:
  • National Institution for Academic Degrees and University Evaluation, Japan;Toshiba Tec Corporation, Japan;Meiji University, Japan

  • Venue:
  • ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We focus on a potential capability of Exploitation-oriented Learning (XoL) in non-Markov multi-agent environments. XoL has some degree of rationality in non-Markov environments and is also confirmed the effectiveness by computer simulations. Penalty Avoiding Rational Policy Making algorithm (PARP) that is one of XoL methods was planed to learn a penalty avoiding policy. PARP is improved to save memories and to cope with uncertainties, that is called Improved PARP. Though the effectiveness of Improved PARP has been confirmed on computer simulations, there is no result in real world environment. In this paper, we show the effectiveness of Improved PARP in real world environment using a keepaway task that is a testbed of multi-agent soccer environment.