Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Control of exploitation-exploration meta-parameter in reinforcement learning
Neural Networks - Computational models of neuromodulation
Efficient Exploration In Reinforcement Learning
Efficient Exploration In Reinforcement Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Exploitation vs. exploration: choosing a supplier in an environment of incomplete information
Decision Support Systems
Improving the Exploration Strategy in Bandit Algorithms
Learning and Intelligent Optimization
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
Value-difference based exploration: adaptive control between epsilon-greedy and softmax
KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
An anti-jamming strategy for channel access in cognitive radio networks
GameSec'11 Proceedings of the Second international conference on Decision and Game Theory for Security
Hi-index | 0.00 |
This paper presents "Value-Difference Based Exploration" (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of e-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent's uncertainty about the environment. VDBE is evaluated on a multi-armed bandit task, which allows for insight into the behavior of the method. Preliminary results indicate that VDBE seems to be more parameter robust than commonly used ad hoc approaches such as e-greedy or softmax.