Reinforcement Learning in RoboCup KeepAway with Partial Observability

Authors:
Sam Devlin;Marek Grzes;Daniel Kudenko
Affiliations:
-;-;-
Venue:
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Year:
2009

Citing 9
Cited 0

TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Introduction to Multiagent Systems

Introduction to Multiagent Systems
Scaling Reinforcement Learning toward RoboCup Soccer

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Dynamic Programming and Optimal Control, Vol. II

Dynamic Programming and Optimal Control, Vol. II
Planning and acting in partially observable stochastic domains

Artificial Intelligence
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially observable environments pose a major challenge to the application of reinforcement learning algorithms. In such environments, due to the Markov property frequently being violated in the system state representation, situations can occur where an agent has insufficient information to decide on the optimal action. In such cases, it is necessary to determine when information gathering actions should be executed, that is, when the agent needs to reduce uncertainty about the current state before deciding on how to act. One possible solution that has been proposed in past research is to manually code rules for execution of information gathering actions in the policy using heuristic (and likely faulty) knowledge. However, such a solution requires explicit expert knowledge about actions which are information gathering. In this paper a flexible solution is proposed which automatically learns when to execute information gathering actions and furthermore to automatically discover which actions gather information. We present an evaluation in the Robo{C}up Keep{A}way domain that empirically shows the robustness of the proposed approach and its success in learning under varying degrees of partial observability. Hence, it eliminates the need for hand-coded rules, is flexible in different situations and does not require knowledge about information gathering actions.