Q-error as a selection mechanism in modular reinforcement-learning systems

Authors:
Mark Ring;Tom Schaul
Affiliations:
IDSIA, Galleria 2, Switzerland;IDSIA, Galleria 2, Switzerland
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Year:
2011

Citing 16
Cited 1

Transfer of Learning by Composing Solutions of Elemental Sequential Tasks

Machine Learning
Learning via task decomposition

Proceedings of the second international conference on From animals to animats 2 : simulation of adaptive behavior: simulation of adaptive behavior
HQ-learning

Adaptive Behavior
Multiple paired forward and inverse models for motor control

Neural Networks - Special issue on neural control and robotics: biology and technology
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiple model-based reinforcement learning

Neural Computation
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Properties of the Bucket Brigade

Proceedings of the 1st International Conference on Genetic Algorithms
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Least-squares policy iteration

The Journal of Machine Learning Research
Inter-module credit assignment in modular reinforcement learning

Neural Networks
Classifier fitness based on accuracy

Evolutionary Computation
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
An adaptive architecture for modular Q-learning

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2

Smart exploration in reinforcement learning using absolute temporal difference errors

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a novel multimodular method for reinforcement learning. A multimodular system is one that partitions the learning task among a set of experts (modules), where each expert is incapable of solving the entire task by itself. There are many advantages to splitting up large tasks in this way, but existing methods face difficulties when choosing which module(s) should contribute to the agent's actions at any particular moment. We introduce a novel selection mechanism where every module, besides calculating a set of action values, also estimates its own error for the current input. The selection mechanism combines each module's estimate of long-term reward and self-error to produce a score by which the next module is chosen. As a result, the modules can use their resources effectively and efficiently divide up the task. The system is shown to learn complex tasks even when the individual modules use only linear function approximators.