On the power of global reward signals in reinforcement learning

Authors:
Thomas Kemmerich;Hans Kleine Büning
Affiliations:
International Graduate School Dynamic Intelligent Systems, University of Paderborn, Paderborn, Germany;Department of Computer Science, University of Paderborn, Paderborn, Germany
Venue:
MATES'11 Proceedings of the 9th German conference on Multiagent system technologies
Year:
2011

Citing 13
Cited 0

Machine Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Introduction to Algorithms

Introduction to Algorithms
The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Formal models and algorithms for decentralized decision making under uncertainty

Autonomous Agents and Multi-Agent Systems
Dynamic programming for partially observable stochastic games

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Optimal and approximate Q-value functions for decentralized POMDPs

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Theoretical considerations of potential-based reward shaping for multi-agent systems

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
A convergent multiagent reinforcement learning approach for a subclass of cooperative stochastic games

ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning is investigated in various models, involving single and multiagent settings as well as fully or partially observable domains. Although such models differ in several aspects, their basic approach is identical: agents obtain a state observation and a global reward signal from an environment and execute actions which in turn influence the environment state. In this work, we discuss the role of such global reward signals. We present a concept that does not provide a visible environment state but only offers a numerical engineered reward. It will be proven that this approach has the same computational complexity and expressive power as ordinary fully observable models, but allows to infringe assumptions in models with partial observability. To avoid such infringements, we then argue that rewards, besides a true reward value, shall never contain additional polynomial time decodable information.