Adaptive ε-greedy exploration in reinforcement learning based on value differences

Authors:
Michel Tokic
Affiliations:
Institute of Applied Research, University of Applied Sciences, Weingarten, Germany and Institute of Neural Information Processing, University of Ulm, Ulm, Germany
Venue:
KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Year:
2010

Citing 10
Cited 2

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Control of exploitation-exploration meta-parameter in reinforcement learning

Neural Networks - Computational models of neuromodulation
Efficient Exploration In Reinforcement Learning

Efficient Exploration In Reinforcement Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Exploitation vs. exploration: choosing a supplier in an environment of incomplete information

Decision Support Systems
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Machine Learning
Improving the Exploration Strategy in Bandit Algorithms

Learning and Intelligent Optimization
Multi-armed bandit algorithms and empirical evaluation

ECML'05 Proceedings of the 16th European conference on Machine Learning

Value-difference based exploration: adaptive control between epsilon-greedy and softmax

KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
An anti-jamming strategy for channel access in cognitive radio networks

GameSec'11 Proceedings of the Second international conference on Decision and Game Theory for Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents "Value-Difference Based Exploration" (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of e-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent's uncertainty about the environment. VDBE is evaluated on a multi-armed bandit task, which allows for insight into the behavior of the method. Preliminary results indicate that VDBE seems to be more parameter robust than commonly used ad hoc approaches such as e-greedy or softmax.