Learning via task decomposition
Proceedings of the second international conference on From animals to animats 2 : simulation of adaptive behavior: simulation of adaptive behavior
Adaptive Behavior
Multiple paired forward and inverse models for motor control
Neural Networks - Special issue on neural control and robotics: biology and technology
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Multiple model-based reinforcement learning
Neural Computation
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Properties of the Bucket Brigade
Proceedings of the 1st International Conference on Genetic Algorithms
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Least-squares policy iteration
The Journal of Machine Learning Research
Inter-module credit assignment in modular reinforcement learning
Neural Networks
Classifier fitness based on accuracy
Evolutionary Computation
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
An adaptive architecture for modular Q-learning
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Smart exploration in reinforcement learning using absolute temporal difference errors
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Hi-index | 0.00 |
This paper introduces a novel multimodular method for reinforcement learning. A multimodular system is one that partitions the learning task among a set of experts (modules), where each expert is incapable of solving the entire task by itself. There are many advantages to splitting up large tasks in this way, but existing methods face difficulties when choosing which module(s) should contribute to the agent's actions at any particular moment. We introduce a novel selection mechanism where every module, besides calculating a set of action values, also estimates its own error for the current input. The selection mechanism combines each module's estimate of long-term reward and self-error to produce a score by which the next module is chosen. As a result, the modules can use their resources effectively and efficiently divide up the task. The system is shown to learn complex tasks even when the individual modules use only linear function approximators.