Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Efficient Exploration In Reinforcement Learning
Efficient Exploration In Reinforcement Learning
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
Application of reinforcement learning to the game of Othello
Computers and Operations Research
Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Value-difference based exploration: adaptive control between epsilon-greedy and softmax
KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
Gradient algorithms for exploration/exploitation trade-offs: global and local variants
ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
Hi-index | 0.00 |
Stochastic neurons are deployed for efficient adaptation of exploration parameters by gradient-following algorithms. The approach is evaluated in model-free temporal-difference learning using discrete actions. The advantage is in particular memory efficiency, because memorizing exploratory data is only required for starting states. Hence, if a learning problem consist of only one starting state, exploratory data can be considered as being global. Results suggest that the presented approach can be efficiently combined with standard off- and on-policy algorithms such as Q-learning and Sarsa.