Machine Learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Introduction to Algorithms
The Complexity of Decentralized Control of Markov Decision Processes
Mathematics of Operations Research
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Formal models and algorithms for decentralized decision making under uncertainty
Autonomous Agents and Multi-Agent Systems
Dynamic programming for partially observable stochastic games
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Optimal and approximate Q-value functions for decentralized POMDPs
Journal of Artificial Intelligence Research
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Theoretical considerations of potential-based reward shaping for multi-agent systems
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Hi-index | 0.00 |
Reinforcement learning is investigated in various models, involving single and multiagent settings as well as fully or partially observable domains. Although such models differ in several aspects, their basic approach is identical: agents obtain a state observation and a global reward signal from an environment and execute actions which in turn influence the environment state. In this work, we discuss the role of such global reward signals. We present a concept that does not provide a visible environment state but only offers a numerical engineered reward. It will be proven that this approach has the same computational complexity and expressive power as ordinary fully observable models, but allows to infringe assumptions in models with partial observability. To avoid such infringements, we then argue that rewards, besides a true reward value, shall never contain additional polynomial time decodable information.