Developing reinforcement learning for adaptive co-construction of continuous high-dimensional state and action spaces

Authors:
Masato Nagayoshi;Hajime Murao;Hisashi Tamaki
Affiliations:
Niigata College of Nursing, Joetsu, Japan 943-0147;Faculty of Cross-Cultural Studies, Kobe University, Kobe, Japan 657-8501;Graduate School of Engineering, Kobe University, Kobe, Japan 657-8501
Venue:
Artificial Life and Robotics
Year:
2012

Citing 3
Cited 1

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A reinforcement learning with switching controllers for a continuous action space

Artificial Life and Robotics
Adaptive co-construction of state and action spaces in reinforcement learning

Artificial Life and Robotics

Reinforcement learning for dynamic environment: a classification of dynamic environments and a detection method of environmental changes

Artificial Life and Robotics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Engineers and researchers are paying more attention to reinforcement learning (RL) as a key technique for realizing adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. Our approach mainly deals with the problem of designing state and action spaces. Previously, an adaptive state space construction method which is called a "state space filter" and an adaptive action space construction method which is called "switching RL", have been proposed after the other space has been fixed. Then, we have reconstituted these two construction methods as one method by treating the former method and the latter method as a combined method for mimicking an infant's perceptual and motor developments and we have proposed a method which is based on introducing and referring to "entropy". In this paper, a computational experiment was conducted using a so-called "robot navigation problem" with three-dimensional continuous state space and two-dimensional continuous action space which is more complicated than a so-called "path planning problem". As a result, the validity of the proposed method has been confirmed.