Adaptive exploration using stochastic neurons

Authors:
Michel Tokic;Günther Palm
Affiliations:
Institute of Neural Information Processing, University of Ulm, Germany,Institute of Applied Research, University of Applied Sciences Ravensburg-Weingarten, Germany;Institute of Neural Information Processing, University of Ulm, Germany
Venue:
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Year:
2012

Citing 8
Cited 1

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Efficient Exploration In Reinforcement Learning

Efficient Exploration In Reinforcement Learning
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Application of reinforcement learning to the game of Othello

Computers and Operations Research
2010 Special Issue: Online learning of shaping rewards in reinforcement learning

Neural Networks
Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Value-difference based exploration: adaptive control between epsilon-greedy and softmax

KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence

Gradient algorithms for exploration/exploitation trade-offs: global and local variants

ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stochastic neurons are deployed for efficient adaptation of exploration parameters by gradient-following algorithms. The approach is evaluated in model-free temporal-difference learning using discrete actions. The advantage is in particular memory efficiency, because memorizing exploratory data is only required for starting states. Hence, if a learning problem consist of only one starting state, exploratory data can be considered as being global. Results suggest that the presented approach can be efficiently combined with standard off- and on-policy algorithms such as Q-learning and Sarsa.