Module-Based Reinforcement Learning: Experiments with a Real Robot

  • Authors:
  • Zsolt Kalmár;Csaba Szepesvári;András Lörincz

  • Affiliations:
  • Department of Informatics, “József Attila” University of Szeged, Szeged, Aradi vrt. tere 1, Hungary H-6720. E-mail: kalmar@mindmaker.kfkipark.hu;Research Group on Artificial Intelligence, “József Attila” University of Szeged Szeged, Aradi vrt. tere 1, Hungary H-6720. E-mail: szepes@mindmaker.kfkipark.hu;Department of Adaptive Systems, “József Attila” University of Szeged Szeged, Aradi vrt. tere 1, Hungary H-6720. E-mail: lorincz@mindmaker.kfkipark.hu

  • Venue:
  • Machine Learning - Special issue on learning in autonomous robots
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

The behavior of reinforcement learning (RL) algorithms is bestunderstood in completely observable, discrete-time controlled Markov chainswith finite state and action spaces. In contrast, robot-learning domains areinherently continuous both in time and space, and moreover are partiallyobservable. Here we suggest a systematic approach to solve such problems inwhich the available qualitative and quantitative knowledge is used to reducethe complexity of learning task. The steps of the design process are to:i) decompose the task into subtasks using the qualitativeknowledge at hand; ii) design local controllers tosolve the subtasks using the available quantitative knowledge and iii) learn a coordination of these controllers by meansof reinforcement learning. It is argued that the approach enables fast,semi-automatic, but still high quality robot-control as no fine-tuning ofthe local controllers is needed. The approach was verified on a non-trivialreal-life robot task. Several RL algorithms were compared by ANOVA and itwas found that the model-based approach worked significantly better thanthe model-free approach. The learnt switching strategy performed comparablyto a handcrafted version. Moreover, the learnt strategy seemed to exploitcertain properties of the environment which were not foreseen in advance,thus supporting the view that adaptive algorithms are advantageous tonon-adaptive ones in complex environments.