TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Introduction to Multiagent Systems
Introduction to Multiagent Systems
Scaling Reinforcement Learning toward RoboCup Soccer
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Dynamic Programming and Optimal Control, Vol. II
Dynamic Programming and Optimal Control, Vol. II
Planning and acting in partially observable stochastic domains
Artificial Intelligence
A heuristic variable grid solution method for POMDPs
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Hi-index | 0.00 |
Partially observable environments pose a major challenge to the application of reinforcement learning algorithms. In such environments, due to the Markov property frequently being violated in the system state representation, situations can occur where an agent has insufficient information to decide on the optimal action. In such cases, it is necessary to determine when information gathering actions should be executed, that is, when the agent needs to reduce uncertainty about the current state before deciding on how to act. One possible solution that has been proposed in past research is to manually code rules for execution of information gathering actions in the policy using heuristic (and likely faulty) knowledge. However, such a solution requires explicit expert knowledge about actions which are information gathering. In this paper a flexible solution is proposed which automatically learns when to execute information gathering actions and furthermore to automatically discover which actions gather information. We present an evaluation in the Robo{C}up Keep{A}way domain that empirically shows the robustness of the proposed approach and its success in learning under varying degrees of partial observability. Hence, it eliminates the need for hand-coded rules, is flexible in different situations and does not require knowledge about information gathering actions.