Module-Based Reinforcement Learning: Experiments with a Real Robot

Authors:
Zsolt Kalmár;Csaba Szepesvári;András Lörincz
Affiliations:
Department of Informatics, “József Attila” University of Szeged, Szeged, Aradi vrt. tere 1, Hungary H-6720. E-mail: kalmar@mindmaker.kfkipark.hu;Research Group on Artificial Intelligence, “József Attila” University of Szeged Szeged, Aradi vrt. tere 1, Hungary H-6720. E-mail: szepes@mindmaker.kfkipark.hu;Department of Adaptive Systems, “József Attila” University of Szeged Szeged, Aradi vrt. tere 1, Hungary H-6720. E-mail: lorincz@mindmaker.kfkipark.hu
Venue:
Machine Learning - Special issue on learning in autonomous robots
Year:
1998

Citing 31
Cited 7

A qualitative physics based on confluences

Artificial Intelligence - Special volume on qualitative reasoning about physical systems
Macro-operators: a weak method for learning

Artificial Intelligence - Lecture notes in computer science 178
Learning to solve problems by searching for macro-operators

Learning to solve problems by searching for macro-operators
Planning as search: a quantitative approach

Artificial Intelligence
Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Technical Note: \cal Q-Learning

Machine Learning
Behavior of an adaptive self-organizing autonomous agent working with cues and competing concepts

Adaptive Behavior
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Robot shaping: developing autonomous agents through learning

Artificial Intelligence
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
ALECSYS and the AutonoMouse: Learning to Control a Real Robot by Distributed Classifier Systems

Machine Learning
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
The loss from imperfect value functions in expectation-based and minimax-based tasks

Machine Learning - Special issue on reinforcement learning
Qualitative system identification: deriving structure from behavior

Artificial Intelligence
Purposive behavior acquisition for a real robot by vision-based reinforcement learning

Machine Learning - Special issue on robot learning
Studies in hybrid systems: modeling, analysis, and control

Studies in hybrid systems: modeling, analysis, and control
Modeling agents as qualitative decision makers

Artificial Intelligence - Special issue on economic principles of multi-agent systems
HQ-learning

Adaptive Behavior
Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
How to dynamically merge Markov decision processes

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Hybrid Systems I

Hybrid Systems I
Reinforcement Learning in the Multi-Robot Domain

Autonomous Robots
Proceedings of the 6th European Workshop on Learning Robots

EWLR-6 Proceedings of the 6th European Workshop on Learning Robots
Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Learning and Exploitation Do Not Conflict Under Minimax Optimality

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Dynamic Programming

Dynamic Programming
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Algorithms for sequential decision-making

Algorithms for sequential decision-making
Human Problem Solving

Human Problem Solving
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Behavior analysis and training-a methodology for behaviorengineering

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Reinforcement Learning Agents

Artificial Intelligence Review
Module Based Reinforcement Learning: An Application to a Real Robot

EWLR-6 Proceedings of the 6th European Workshop on Learning Robots
ε-mdps: learning in varying environments

The Journal of Machine Learning Research
Value Function Based Reinforcement Learning in Changing Markovian Environments

The Journal of Machine Learning Research
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
A case study for learning behaviors in mobile robotics by evolutionary fuzzy systems

Expert Systems with Applications: An International Journal
Reinforcement learning in mirrorbot

ICANN'05 Proceedings of the 15th international conference on Artificial Neural Networks: biological Inspirations - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The behavior of reinforcement learning (RL) algorithms is bestunderstood in completely observable, discrete-time controlled Markov chainswith finite state and action spaces. In contrast, robot-learning domains areinherently continuous both in time and space, and moreover are partiallyobservable. Here we suggest a systematic approach to solve such problems inwhich the available qualitative and quantitative knowledge is used to reducethe complexity of learning task. The steps of the design process are to:i) decompose the task into subtasks using the qualitativeknowledge at hand; ii) design local controllers tosolve the subtasks using the available quantitative knowledge and iii) learn a coordination of these controllers by meansof reinforcement learning. It is argued that the approach enables fast,semi-automatic, but still high quality robot-control as no fine-tuning ofthe local controllers is needed. The approach was verified on a non-trivialreal-life robot task. Several RL algorithms were compared by ANOVA and itwas found that the model-based approach worked significantly better thanthe model-free approach. The learnt switching strategy performed comparablyto a handcrafted version. Moreover, the learnt strategy seemed to exploitcertain properties of the environment which were not foreseen in advance,thus supporting the view that adaptive algorithms are advantageous tonon-adaptive ones in complex environments.