The penalty avoiding rational policy making algorithm in continuous action spaces

Authors:
Kazuteru Miyazaki
Affiliations:
National Institution for Academic Degrees and University Evaluation, Kodaira-city, Tokyo, Japan
Venue:
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Year:
2010

Citing 3
Cited 0

Experiments with reinforcement learning in problems with continuous state and action spaces

Adaptive Behavior
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Motivated reinforcement learning for adaptive characters in open-ended simulation games

Proceedings of the international conference on Advances in computer entertainment technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning involves learning to adapt to environments through the presentation of rewards - special input - serving as clues. To obtain quick rational policies, profit sharing, rational policy making algorithm, penalty avoiding rational policy making algorithm (PARP), PS-r* and PS-r# are used. They are called Exploitation-oriented Learning (XoL). When applying reinforcement learning to actual problems, treatment of continuous-valued input and output are sometimes required. A method based on PARP is proposed as a XoL method corresponding to the continuous-valued input, but continuous-valued output cannot be treated. We study the treatment of continuous-valued output suitable for a XoL method in which the environment includes both a reward and a penalty. We extend PARP in the continuous-valued input to continuous-valued output. We apply our proposal to the pole-cart balancing problem and confirm its validity.