Efficient Uncertainty Propagation for Reinforcement Learning with Limited Data

Authors:
Alexander Hans;Steffen Udluft
Affiliations:
Siemens AG, Corporate Technology, Information & Communications, Learning Systems, Munich, Germany D-81739 and Neuroinformatics and Cognitive Robotics Lab, Ilmenau Technical University, Ilmenau, Ge ...;Siemens AG, Corporate Technology, Information & Communications, Learning Systems, Munich, Germany D-81739
Venue:
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Year:
2009

Citing 6
Cited 1

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
An Empirical Evaluation of Interval Estimation for Markov Decision Processes

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Proceedings of the 24th international conference on Machine learning
Bayesian actor-critic algorithms

Proceedings of the 24th international conference on Machine learning

Uncertainty Propagation for Efficient Exploration in Reinforcement Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a typical reinforcement learning (RL) setting details of the environment are not given explicitly but have to be estimated from observations. Most RL approaches only optimize the expected value. However, if the number of observations is limited considering expected values only can lead to false conclusions. Instead, it is crucial to also account for the estimator's uncertainties. In this paper, we present a method to incorporate those uncertainties and propagate them to the conclusions. By being only approximate, the method is computationally feasible. Furthermore, we describe a Bayesian approach to design the estimators. Our experiments show that the method considerably increases the robustness of the derived policies compared to the standard approach.