Module-Based Reinforcement Learning: Experiments with a Real Robot

  • Authors:
  • Zsolt Kalmár;Csaba Szepesvári;András Lőrincz

  • Affiliations:
  • Department of Informatics, “József Attila” University of Szeged, Szeged, Aradi vrt. tere 1, Hungary H-6720. E-mail: kalmar@mindmaker.kfkipark.hu;Research Group on Artificial Intelligence, “József Attila” Uni versity of Szeged, Szeged, Aradi vrt. tere 1, Hungary H-6720. E-mail: szepes@mindmaker.kfkipark.hu;Department of Adaptive Systems, “József Attila” University of Szeged, Szeged, Aradi vrt. tere 1, Hungary H-6720. E-mail: lorincz@mindmaker.kfkipark.hu

  • Venue:
  • Autonomous Robots
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

The behavior of reinforcement learning (RL)algorithms is best understood in completely observable, discrete-timecontrolled Markov chains with finite state and action spaces. Incontrast, robot-learning domains are inherently continuous both intime and space, and moreover are partially observable. Here wesuggest a systematic approach to solve such problems in which theavailable qualitative and quantitative knowledge is used to reducethe complexity of learning task. The steps of the design process areto: (i) decompose the task into subtasks using the qualitativeknowledge at hand; (ii) design local controllers to solve thesubtasks using the available quantitative knowledge, and (iii) learna coordination of these controllers by means of reinforcementlearning. It is argued that the approach enables fast,semi-automatic, but still high quality robot-control as nofine-tuning of the local controllers is needed. The approach wasverified on a non-trivial real-life robot task. Several RLalgorithms were compared by ANOVA and it was found that themodel-based approach worked significantly better than the model-freeapproach. The learnt switching strategy performed comparably to ahandcrafted version. Moreover, the learnt strategy seemed to exploitcertain properties of the environment which were not foreseen inadvance, thus supporting the view that adaptive algorithms areadvantageous to nonadaptive ones in complex environments.