An adaptive architecture for modular Q-learning

  • Authors:
  • Takayuki Kohri;Kei Matsubayashi;Mario Tokoro

  • Affiliations:
  • Sony Computer Science Laboratory Inc., Shinagawa-ku, Tokyo, Japan and Department of Computer Science, Faculty of Science and Technology, Keio University, Yokohama, Japan;Faculty of Science and Technology, Keio University, Kawasaki, Japan and Department of Computer Science, Faculty of Science and Technology, Keio University, Yokohama, Japan;Faculty of Science and Technology, Keio University, Kawasaki, Japan and Department of Computer Science, Faculty of Science and Technology, Keio University, Yokohama, Japan

  • Venue:
  • IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement learning is a technique to learn suitable action policies that maximize utility, via the clue of reinforcement signals: reward or punishment. Q-learning, a widely used reinforcement learning method, has been analyzed in much research on autonomous agents. However, as the size of the problem space increases, agents need more computational resources and require more time to learn appropriate policies. Whitehead proposed an architecture called modular Q-learning, that decomposes the whole problem space into smaller subproblem spaces, and distributes them among multiple modules. Thus, each module takes charge of part of the whole problem. In modular Q-learning, however, human designers have to decompose the problem space, and create a suitable set of modules manually. Agents with such a fixed module architecture cannot adapt themselves to dynamic environments. Here, we propose a new architecture for reinforcement learning called AMQL (Automatic Modular Q-Learning), that enables agents to obtain a suitable set of modules by themselves using a selection method. Through experiments, we show that agents can automatically obtain suitable modules to gain a reward. Furthermore, we show that agents can adapt themselves to dynamic environments efficiently, through reconstructing modules.