Adaptive ε-greedy exploration in reinforcement learning based on value differences

  • Authors:
  • Michel Tokic

  • Affiliations:
  • Institute of Applied Research, University of Applied Sciences, Weingarten, Germany and Institute of Neural Information Processing, University of Ulm, Ulm, Germany

  • Venue:
  • KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents "Value-Difference Based Exploration" (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of e-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent's uncertainty about the environment. VDBE is evaluated on a multi-armed bandit task, which allows for insight into the behavior of the method. Preliminary results indicate that VDBE seems to be more parameter robust than commonly used ad hoc approaches such as e-greedy or softmax.