An adaptive architecture for modular Q-learning

Authors:
Takayuki Kohri;Kei Matsubayashi;Mario Tokoro
Affiliations:
Sony Computer Science Laboratory Inc., Shinagawa-ku, Tokyo, Japan and Department of Computer Science, Faculty of Science and Technology, Keio University, Yokohama, Japan;Faculty of Science and Technology, Keio University, Kawasaki, Japan and Department of Computer Science, Faculty of Science and Technology, Keio University, Yokohama, Japan;Faculty of Science and Technology, Keio University, Kawasaki, Japan and Department of Computer Science, Faculty of Science and Technology, Keio University, Yokohama, Japan
Venue:
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Year:
1997

Citing 1
Cited 3

Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]

Modular-Fuzzy Cooperation Algorithm for Multi-agent Systems

ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Q-error as a selection mechanism in modular reinforcement-learning systems

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Designing of on line intrusion detection system using rough set theory and Q-learning algorithm

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning is a technique to learn suitable action policies that maximize utility, via the clue of reinforcement signals: reward or punishment. Q-learning, a widely used reinforcement learning method, has been analyzed in much research on autonomous agents. However, as the size of the problem space increases, agents need more computational resources and require more time to learn appropriate policies. Whitehead proposed an architecture called modular Q-learning, that decomposes the whole problem space into smaller subproblem spaces, and distributes them among multiple modules. Thus, each module takes charge of part of the whole problem. In modular Q-learning, however, human designers have to decompose the problem space, and create a suitable set of modules manually. Agents with such a fixed module architecture cannot adapt themselves to dynamic environments. Here, we propose a new architecture for reinforcement learning called AMQL (Automatic Modular Q-Learning), that enables agents to obtain a suitable set of modules by themselves using a selection method. Through experiments, we show that agents can automatically obtain suitable modules to gain a reward. Furthermore, we show that agents can adapt themselves to dynamic environments efficiently, through reconstructing modules.