Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Efficient Exploration In Reinforcement Learning
Efficient Exploration In Reinforcement Learning
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
Application of reinforcement learning to the game of Othello
Computers and Operations Research
Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Value-difference based exploration: adaptive control between epsilon-greedy and softmax
KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
Adaptive exploration using stochastic neurons
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Hi-index | 0.00 |
Gradient-following algorithms are deployed for efficient adaptation of exploration parameters in temporal-difference learning with discrete action spaces. Global and local variants are evaluated in discrete and continuous state spaces. The global variant is memory efficient in terms of requiring exploratory data only for starting states. In contrast, the local variant requires exploratory data for each state of the state space, but produces exploratory behavior only in states with improvement potential. Our results suggest that gradient-based exploration can be efficiently used in combination with off- and on-policy algorithms such as Q-learning and Sarsa.