Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

Authors:
Hajime Fujita;Shin Ishii
Affiliations:
hajime-f@is.naist.jp;Nara Institute of Science and Technology, Graduate School of Information Science, Ikoma, Nara 630-0192, Japan ishii@is.naist.jp
Venue:
Neural Computation
Year:
2007

Citing 38
Cited 3

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Technical Note: \cal Q-Learning

Machine Learning
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Reinforcement learning of non-Markov decision processes

Artificial Intelligence - Special volume on computational research on interaction and agency, part 2
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Planning and acting in partially observable stochastic domains

Artificial Intelligence
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Elevator Group Control Using Multiple Reinforcement Learning Agents

Machine Learning
A multiagent reinforcement learning algorithm using extended optimal response

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information

Machine Learning
Multiagent Systems: A Survey from a Machine Learning Perspective

Autonomous Robots
Multiple model-based reinforcement learning

Neural Computation
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning Policies with External Memory

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning Probabilistic Models for Decision-Theoretic Navigation of Mobile Robots

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning to Evaluate Go Positions via Temporal Difference Methods

Learning to Evaluate Go Positions via Temporal Difference Methods
Two Search Techniques for Imperfect Information Games and Application to Hearts

Two Search Techniques for Imperfect Information Games and Application to Hearts
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Multiplayer games: algorithms and approaches

Multiplayer games: algorithms and approaches
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1
A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Machine Learning
On-line EM Algorithm for the Normalized Gaussian Network

Neural Computation
Reinforcement learning for a CPG-driven biped robot

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Dynamic programming for partially observable stochastic games

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Taming decentralized POMDPs: towards efficient policy computation for multiagent settings

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
System identification based on online variational bayes method and its application to reinforcement learning

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
Feature construction for reinforcement learning in hearts

CG'06 Proceedings of the 5th international conference on Computers and games
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Strategy-acquisition system for video trading card game

ACE '08 Proceedings of the 2008 International Conference on Advances in Computer Entertainment Technology
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks
Autonomously acquiring a video game agent's behavior: letting players feel like playing with a human player

ACE'12 Proceedings of the 9th international conference on Advances in Computer Entertainment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Games constitute a challenging domain of reinforcement learning (RL) for acquiring strategies because many of them include multiple players and many unobservable variables in a large state space. The difficulty of solving such realistic multiagent problems with partial observability arises mainly from the fact that the computational cost for the estimation and prediction in the whole state space, including unobservable variables, is too heavy. To overcome this intractability and enable an agent to learn in an unknown environment, an effective approximation method is required with explicit learning of the environmental model. We present a model-based RL scheme for large-scale multiagent problems with partial observability and apply it to a card game, hearts. This game is a well-defined example of an imperfect information game and can be approximately formulated as a partially observable Markov decision process (POMDP) for a single learning agent. To reduce the computational cost, we use a sampling technique in which the heavy integration required for the estimation and prediction can be approximated by a plausible number of samples. Computer simulation results show that our method is effective in solving such a difficult, partially observable multiagent problem.