Proposal of Exploitation-Oriented Learning PS-r#

Authors:
Kazuteru Miyazaki;Shigenobu Kobayashi
Affiliations:
National Institution for Academic Degrees and University Evaluation, Kodaira-city, Tokyo, Japan 187-8587;Tokyo Institute of Technology, Yokohama, Japan 226-8502
Venue:
IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Year:
2008

Citing 5
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Exploration and apprenticeship learning in reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Motivated reinforcement learning for adaptive characters in open-ended simulation games

Proceedings of the international conference on Advances in computer entertainment technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Exploitation-oriented Learning (XoL) is a novel approach to goal-directed learning from interaction. Though reinforcement learning is much more focus on the learning and can gurantee the optimality in Markov Decision Processes (MDPs) environments, XoL aims to learn a rational policy , whose expected reward per an action is larger than zero, very quickly. We know PS-r* that is one of the XoL methods. It can learn an useful rational policy that is not inferior to a random walk in Partially Observed Markov Decision Processes (POMDPs) environments where the number of types of a reward is one. However, PS-r* requires O (MN 2) memories where N and M are the numbers of types of a sensory input and an action.In this paper, we propose PS-r# that can learn an useful rational policy in the POMDPs environments by O (MN ) memories. We confirm the effectiveness of PS-r# in numerical examples.