Bayesian Reward Filtering

Authors:
Matthieu Geist;Olivier Pietquin;Gabriel Fricout
Affiliations:
Supélec, IMS Research Group, Metz, France and MCE Department, ArcelorMittal Research, Maizières-lès-Metz, France;Supélec, IMS Research Group, Metz, France;MCE Department, ArcelorMittal Research, Maizières-lès-Metz, France
Venue:
Recent Advances in Reinforcement Learning
Year:
2008

Citing 8
Cited 1

Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Mode-Finding for Mixtures of Gaussian Distributions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Kalman filter control embedded into the reinforcement learning framework

Neural Computation
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Kernel rewards regression: an information efficient batch policy iteration approach

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning

Kalman temporal differences

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

A wide variety of function approximation schemes have been applied to reinforcement learning. However, Bayesian filtering approaches, which have been shown efficient in other fields such as neural network training, have been little studied. We propose a general Bayesian filtering framework for reinforcement learning, as well as a specific implementation based on sigma point Kalman filtering and kernel machines. This allows us to derive an efficient off-policy model-free approximate temporal differences algorithm which will be demonstrated on two simple benchmarks.