A competitive strategy for function approximation in Q-learning

Authors:
Alejandro Agostini;Enric Celaya
Affiliations:
Institut de Robòtica i Informàtica Industrial, CSIC, UPC, Barcelona, Spain;Institut de Robòtica i Informàtica Industrial, CSIC, UPC, Barcelona, Spain
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Year:
2011

Citing 7
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Reinforcement Learning in Continuous Time and Space

Neural Computation
Natural Actor-Critic

Neurocomputing
Gaussian process dynamic programming

Neurocomputing
Adaptive autonomous control using online value iteration with Gaussian processes

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we propose an approach for generalization in continuous domain Reinforcement Learning that, instead of using a single function approximator, tries many different function approximators in parallel, each one defined in a different region of the domain. Associated with each approximator is a relevance function that locally quantifies the quality of its approximation, so that, at each input point, the approximator with highest relevance can be selected. The relevance function is defined using parametric estimations of the variance of the q-values and the density of samples in the input space, which are used to quantify the accuracy and the confidence in the approximation, respectively. These parametric estimations are obtained from a probability density distribution represented as a Gaussian Mixture Model embedded in the input-output space of each approximator. In our experiments, the proposed approach required a lesser number of experiences for learning and produced more stable convergence profiles than when using a single function approximator.