Inter-module credit assignment in modular reinforcement learning

Authors:
Kazuyuki Samejima;Kenji Doya;Mitsuo Kawato
Affiliations:
Human information science laboratories, ATR International,2-2-2 Hikaridai, Seika, Soraku, Kyoto 619-0238, Japan and Creating the Brain, CREST, Japan Science and Technology Corporation,2-2-2 Hikari ...;Human information science laboratories, ATR International,2-2-2 Hikaridai, Seika, Soraku, Kyoto 619-0238, Japan and Creating the Brain, CREST, Japan Science and Technology Corporation,2-2-2 Hikari ...;Human information science laboratories, ATR International,2-2-2 Hikaridai, Seika, Soraku, Kyoto 619-0238, Japan
Venue:
Neural Networks
Year:
2003

Citing 9
Cited 5

Transfer of Learning by Composing Solutions of Elemental Sequential Tasks

Machine Learning
HQ-learning

Adaptive Behavior
Multiple paired forward and inverse models for motor control

Neural Networks - Special issue on neural control and robotics: biology and technology
Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiple model-based reinforcement learning

Neural Computation
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Reinforcement Learning in Continuous Time and Space

Neural Computation

Self-stabilizing human-like motion control framework for humanoids using neural oscillators

ICIC'09 Proceedings of the 5th international conference on Emerging intelligent computing technology and applications
The eMOSAIC model for humanoid robot control

Neural Networks
Q-error as a selection mechanism in modular reinforcement-learning systems

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
A distributed Q-learning approach for variable attention to multiple critics

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
A novel modular Q-learning architecture to improve performance under incomplete learning in a grid soccer game

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Critical issues in modular or hierarchical reinforcement learning (RL) are (i) how to decompose a task into sub-tasks, (ii) how to achieve independence of learning of sub-tasks, and (iii) how to assure optimality of the composite policy for the entire task. The second and last requirements are often under trade-off. We propose a method for propagating the reward for the entire task achievement between modules. This is done in the form of a 'modular reward', which is calculated from the temporal difference of the module gating signal and the value of the succeeding module. We implement modular reward for a multiple model-based reinforcement learning (MMRL) architecture and show its effectiveness in simulations of a pursuit task with hidden states and a continuous-time non-linear control task.