On the power of global reward signals in reinforcement learning

  • Authors:
  • Thomas Kemmerich;Hans Kleine Büning

  • Affiliations:
  • International Graduate School Dynamic Intelligent Systems, University of Paderborn, Paderborn, Germany;Department of Computer Science, University of Paderborn, Paderborn, Germany

  • Venue:
  • MATES'11 Proceedings of the 9th German conference on Multiagent system technologies
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement learning is investigated in various models, involving single and multiagent settings as well as fully or partially observable domains. Although such models differ in several aspects, their basic approach is identical: agents obtain a state observation and a global reward signal from an environment and execute actions which in turn influence the environment state. In this work, we discuss the role of such global reward signals. We present a concept that does not provide a visible environment state but only offers a numerical engineered reward. It will be proven that this approach has the same computational complexity and expressive power as ordinary fully observable models, but allows to infringe assumptions in models with partial observability. To avoid such infringements, we then argue that rewards, besides a true reward value, shall never contain additional polynomial time decodable information.