Evaluation of Batch-Mode Reinforcement Learning Methods for Solving DEC-MDPs with Changing Action Sets

  • Authors:
  • Thomas Gabel;Martin Riedmiller

  • Affiliations:
  • Neuroinformatics Group Department of Mathematics and Computer Science, University of Osnabrück, Osnabrück, Germany 49069;Neuroinformatics Group Department of Mathematics and Computer Science, University of Osnabrück, Osnabrück, Germany 49069

  • Venue:
  • Recent Advances in Reinforcement Learning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

DEC-MDPs with changing action sets and partially ordered transition dependencies have recently been suggested as a sub-class of general DEC-MDPs that features provably lower complexity. In this paper, we investigate the usability of a coordinated batch-mode reinforcement learning algorithm for this class of distributed problems. Our agents acquire their local policies independent of the other agents by repeated interaction with the DEC-MDP and concurrent evolvement of their policies, where the learning approach employed builds upon a specialized variant of a neural fitted Q iteration algorithm, enhanced for use in multi-agent settings. We applied our learning approach to various scheduling benchmark problems and obtained encouraging results that show that problems of current standards of difficulty can very well approximately, and in some cases optimally be solved.