Self-organized Reinforcement Learning Based on Policy Gradient in Nonstationary Environments

Authors:
Yu Hiei;Takeshi Mori;Shin Ishii
Affiliations:
Graduate School of Information Science, Nara Institute of Science and Technology (NAIST), , Ikoma, Japan 630-0192;Graduate School of Informatics, Kyoto University, Gokasho, Uji, Japan 611-0011;Graduate School of Information Science, Nara Institute of Science and Technology (NAIST), , Ikoma, Japan 630-0192 and Graduate School of Informatics, Kyoto University, Gokasho, Uji, Japan 611-0011
Venue:
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Year:
2008

Citing 9
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Transfer of Learning by Composing Solutions of Elemental Sequential Tasks

Machine Learning
Multiple paired forward and inverse models for motor control

Neural Networks - Special issue on neural control and robotics: biology and technology
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiple model-based reinforcement learning

Neural Computation
Training products of experts by minimizing contrastive divergence

Neural Computation
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
MOSAIC Model for Sensorimotor Learning and Control

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In real-world problems, the environment surrounding a controlled system is nonstationary, and the optimal control may change with time. It is difficult to learn such controls when using reinforcement learning (RL) which usually assumes stationary Markov decision processes. A modular-based RL method was formerly proposed by Doya et al., in which multiple-paired predictors and controllers were gated to produce nonstationary controls, and its effectiveness in nonstationary problems was shown. However, learning of time-dependent decomposition of the constituent pairs could be unstable, and the resulting control was somehow obscure due to the heuristical combination of predictors and controllers. To overcome these difficulties, we propose a new modular RL algorithm, in which predictors are learned in a self-organized manner to realize stable decomposition and controllers are appropriately optimized by a policy gradient-based RL method. Computer simulations show that our method achieves faster and more stable learning than the previous one.