Risk-Sensitive Reinforcement Learning

Authors:
Oliver Mihatsch;Ralph Neuneier
Affiliations:
Siemens AG, Corporate Technology, Information and Communications 4, D-81730 Munich, Germany. oliver.mihatsch@mchp.siemens.de;Siemens AG, Corporate Technology, Information and Communications 4, D-81730 Munich, Germany. ralph.neuneier@mchp.siemens.de
Venue:
Machine Learning
Year:
2002

Citing 7
Cited 7

Enhancing Q-learning for optimal asset allocation

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Risk sensitive reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
Call admission control and routing in integrated services networks using neuro-dynamic programming

IEEE Journal on Selected Areas in Communications

Risk-sensitive reinforcement learning applied to control under constraints

Journal of Artificial Intelligence Research
Least absolute policy iteration for robust value function approximation

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Adaptive stock trading with dynamic asset allocation using reinforcement learning

Information Sciences: an International Journal
Compound reinforcement learning: theory and an application to finance

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Safe exploration of state and action spaces in reinforcement learning

Journal of Artificial Intelligence Research
Variable risk control via stochastic optimization

International Journal of Robotics Research
Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Journal of Intelligent and Robotic Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control, which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm.Our risk-sensitive reinforcement learning algorithm is based on a very different philosophy. Instead of transforming the return of the process, we transform the temporal differences during learning. While our approach reflects important properties of the classical exponential utility framework, we avoid its serious drawbacks for learning. Based on an extended set of optimality equations we are able to formulate risk-sensitive versions of various well-known reinforcement learning algorithms which converge with probability one under the usual conditions.