Robust Reinforcement Learning

Authors:
Jun Morimoto;Kenji Doya
Affiliations:
Computational Brain Project, ICORP, JST, Sora Ku-gun, Kyoto, 619-0288, Japan/ ATR Computational Neuroscience Laboratories, Soraku-gun, Kyoto 619-0288, Japan;ATR Computational Neuroscience Laboratories, Soraku-gun, Kyoto 619-0288, Japan/ Initial Research Project, OIST, Gushikawa, Okinawa, 904-2234, Japan/ and CREST, JST, Soraku-gun, Kyoto 619-0288, Jap ...
Venue:
Neural Computation
Year:
2005

Citing 7
Cited 4

Practical Issues in Temporal Difference Learning

Machine Learning
Robust and optimal control

Robust and optimal control
Risk sensitive reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Acquisition of Stand-up Behavior by a Real Robot using Hierarchical Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Reinforcement Learning in Continuous Time and Space

Neural Computation
Brief Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes

Automatica (Journal of IFAC)
Value-function reinforcement learning in Markov games

Cognitive Systems Research

A robust Markov game controller for nonlinear systems

Applied Soft Computing
Least absolute policy iteration for robust value function approximation

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Review article: Synergizing reinforcement learning and game theory-A new direction for control

Applied Soft Computing
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H∞ control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H∞ control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.