Gradient algorithms for exploration/exploitation trade-offs: global and local variants

Authors:
Michel Tokic;Günther Palm
Affiliations:
Institute of Neural Information Processing, University of Ulm, Germany,Institute of Applied Research, University of Applied Sciences, Ravensburg-Weingarten, Germany;Institute of Neural Information Processing, University of Ulm, Germany
Venue:
ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
Year:
2012

Citing 11
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Efficient Exploration In Reinforcement Learning

Efficient Exploration In Reinforcement Learning
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Machine Learning
Application of reinforcement learning to the game of Othello

Computers and Operations Research
2010 Special Issue: Online learning of shaping rewards in reinforcement learning

Neural Networks
Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Value-difference based exploration: adaptive control between epsilon-greedy and softmax

KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
Adaptive exploration using stochastic neurons

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gradient-following algorithms are deployed for efficient adaptation of exploration parameters in temporal-difference learning with discrete action spaces. Global and local variants are evaluated in discrete and continuous state spaces. The global variant is memory efficient in terms of requiring exploratory data only for starting states. In contrast, the local variant requires exploratory data for each state of the state space, but produces exploratory behavior only in states with improvement potential. Our results suggest that gradient-based exploration can be efficiently used in combination with off- and on-policy algorithms such as Q-learning and Sarsa.