Module Based Reinforcement Learning: An Application to a Real Robot

Authors:
Zsolt Kalmár;Csaba Szepesvári;András Lörincz
Affiliations:
-;-;-
Venue:
EWLR-6 Proceedings of the 6th European Workshop on Learning Robots
Year:
1997

Citing 14
Cited 1

A bottom-up mechanism for behavior selection in an artificial creature

Proceedings of the first international conference on simulation of adaptive behavior on From animals to animats
Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Technical Note: \cal Q-Learning

Machine Learning
Behavior of an adaptive self-organizing autonomous agent working with cues and competing concepts

Adaptive Behavior
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Purposive behavior acquisition for a real robot by vision-based reinforcement learning

Machine Learning - Special issue on robot learning
Module-Based Reinforcement Learning: Experiments with a Real Robot

Machine Learning - Special issue on learning in autonomous robots
A unified analysis of value-function-based reinforcement learning algorithms

Neural Computation
Reinforcement Learning in the Multi-Robot Domain

Autonomous Robots
Dynamic Programming

Dynamic Programming
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Algorithms for sequential decision-making

Algorithms for sequential decision-making
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Learning to act using real-time dynamic programming

Artificial Intelligence

Learning a Navigation Task in Changing Environments by Multi-task Reinforcement Learning

EWLR-8 Proceedings of the 8th European Workshop on Learning Robots: Advances in Robot Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, "approximately" Markovian task, which is completely observable too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the "module-level" that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future.