Learning to play using low-complexity rule-based policies: illustrations through Ms. Pac-Man

Authors:
István Szita;András Lõrincz
Affiliations:
Dept. of Information Systems, Eötvös University, Hungary;Dept. of Information Systems, Eötvös University, Hungary
Venue:
Journal of Artificial Intelligence Research
Year:
2007

Citing 15
Cited 11

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Arithmetic coding for data compression

Communications of the ACM
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Reinforcement learning and chess

Machines that learn to play games
A Computer Scientist's View of Life, the Universe, and Everything

Foundations of Computer Science: Potential - Theory - Cognition, to Wilfried Brauer on the occasion of his sixtieth birthday
Decision-Theoretic Planning with Concurrent Temporally Extended Actions

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Rare event simulation and combinatorial optimization using cross entropy: sequence alignment by rare event simulation

Proceedings of the 34th conference on Winter simulation: exploring new frontiers
Adaptive game AI with dynamic scripting

Machine Learning
Cross-entropic learning of a machine for the decision in a partially observable universe

Journal of Global Optimization
Learning tetris using the noisy cross-entropy method

Neural Computation
The equation for response to selection and its use for prediction

Evolutionary Computation
Cross-Entropy optimization for independent process analysis

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
No free lunch theorems for optimization

IEEE Transactions on Evolutionary Computation

Ms Pac-Man competition

ACM SIGEVOlution
Machine learning in digital games: a survey

Artificial Intelligence Review
RAMP: a rule-based agent for Ms. Pac-Man

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Evolution versus temporal difference learning for learning to play Ms. Pac-Man

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
A simple tree search method for playing Ms. Pac-Man

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Multi-target adaptive A*

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
The neuronal replicator hypothesis

Neural Computation
Constitution of Ms.PacMan player with critical-situation learning mechanism

International Journal of Knowledge Engineering and Soft Data Paradigms
Sparse and silent coding in neural circuits

Neurocomputing
Evolving a ms. pacman controller using grammatical evolution

EvoApplicatons'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part I
Evolving trading rule-based policies

EvoCOMNET'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either handcrafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist.