Proposal of Exploitation-Oriented Learning PS-r#

  • Authors:
  • Kazuteru Miyazaki;Shigenobu Kobayashi

  • Affiliations:
  • National Institution for Academic Degrees and University Evaluation, Kodaira-city, Tokyo, Japan 187-8587;Tokyo Institute of Technology, Yokohama, Japan 226-8502

  • Venue:
  • IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Exploitation-oriented Learning (XoL) is a novel approach to goal-directed learning from interaction. Though reinforcement learning is much more focus on the learning and can gurantee the optimality in Markov Decision Processes (MDPs) environments, XoL aims to learn a rational policy , whose expected reward per an action is larger than zero, very quickly. We know PS-r* that is one of the XoL methods. It can learn an useful rational policy that is not inferior to a random walk in Partially Observed Markov Decision Processes (POMDPs) environments where the number of types of a reward is one. However, PS-r* requires O (MN 2) memories where N and M are the numbers of types of a sensory input and an action.In this paper, we propose PS-r# that can learn an useful rational policy in the POMDPs environments by O (MN ) memories. We confirm the effectiveness of PS-r# in numerical examples.