Mosaic for multiple-reward environments

Authors:
Norikazu Sugimoto;Masahiko Haruno;Kenji Doya;Mitsuo Kawato
Affiliations:
-;-;-;-
Venue:
Neural Computation
Year:
2012

Citing 13
Cited 0

Multiple paired forward and inverse models for motor control

Neural Networks - Special issue on neural control and robotics: biology and technology
Multiple paired forward-inverse models for human motor learning and control

Proceedings of the 1998 conference on Advances in neural information processing systems II
Robust Monte Carlo localization for mobile robots

Artificial Intelligence
Multiple model-based reinforcement learning

Neural Computation
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
MOSAIC Model for Sensorimotor Learning and Control

Neural Computation
Reinforcement Learning in Continuous Time and Space

Neural Computation
Dealing with non-stationary environments using context detection

ICML '06 Proceedings of the 23rd international conference on Machine learning
Hierarchical multi-agent reinforcement learning

Autonomous Agents and Multi-Agent Systems
Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning

Neural Networks - 2006 Special issue: Neurobiology of decision making
Fast learning in networks of locally-tuned processing units

Neural Computation
Probabilistic robot navigation in partially observable environments

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
eMOSAIC model for humanoid robot control

SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning (RL) can provide a basic framework for autonomous robots to learn to control and maximize future cumulative rewards in complex environments. To achieve high performance, RL controllers must consider the complex external dynamics for movements and task (reward function) and optimize control commands. For example, a robot playing tennis and squash needs to cope with the different dynamics of a tennis or squash racket and such dynamic environmental factors as the wind. In addition, this robot has to tailor its tactics simultaneously under the rules of either game. This double complexity of the external dynamics and reward function sometimes becomes more complex when both the multiple dynamics and multiple reward functions switch implicitly, as in the situation of a real (multi-agent) game of tennis where one player cannot observe the intention of her opponents or her partner. The robot must consider its opponent's and its partner's unobservable behavioral goals (reward function). In this article, we address how an RL agent should be designed to handle such double complexity of dynamics and reward. We have previously proposed modular selection and identification for control (MOSAIC) to cope with nonstationary dynamics where appropriate controllers are selected and learned among many candidates based on the error of its paired dynamics predictor: the forward model. Here we extend this framework for RL and propose MOSAIC-MR architecture. It resembles MOSAIC in spirit and selects and learns an appropriate RL controller based on the RL controller's TD error using the errors of the dynamics (the forward model) and the reward predictors. Furthermore, unlike other MOSAIC variants for RL, RL controllers are not a priori paired with the fixed predictors of dynamics and rewards. The simulation results demonstrate that MOSAIC-MR outperforms other counterparts because of this flexible association ability among RL controllers, forward models, and reward predictors.