Gradient algorithms for exploration/exploitation trade-offs: global and local variants

  • Authors:
  • Michel Tokic;Günther Palm

  • Affiliations:
  • Institute of Neural Information Processing, University of Ulm, Germany,Institute of Applied Research, University of Applied Sciences, Ravensburg-Weingarten, Germany;Institute of Neural Information Processing, University of Ulm, Germany

  • Venue:
  • ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gradient-following algorithms are deployed for efficient adaptation of exploration parameters in temporal-difference learning with discrete action spaces. Global and local variants are evaluated in discrete and continuous state spaces. The global variant is memory efficient in terms of requiring exploratory data only for starting states. In contrast, the local variant requires exploratory data for each state of the state space, but produces exploratory behavior only in states with improvement potential. Our results suggest that gradient-based exploration can be efficiently used in combination with off- and on-policy algorithms such as Q-learning and Sarsa.