Non-stationary policy learning in 2-player zero sum games

Authors:
Steven Jensen;Daniel Boley;Maria Gini;Paul Schrater
Affiliations:
Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN;Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN;Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN;Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN
Venue:
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Year:
2005

Citing 9
Cited 2

The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones

Machine Learning
Multiagent learning using a variable learning rate

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Design of a linguistic postprocessor using variable memory length Markov models

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Rapid on-line temporal sequence prediction by an adaptive agent

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Adaptive Mixtures of Probabilistic Transducers

Neural Computation

Predicting opponent resource allocations when qualitative and contextual information is not available

Proceedings of the 4th International Conference on Foundations of Digital Games
When speed matters in learning against adversarial opponents

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

A key challenge in multiagent environments is the construction of agents that are able to learn while acting in the presence of other agents that are simultaneously learning and adapting. These domains require on-line learning methods without the benefit of repeated training examples, as well as the ability to adapt to the evolving behavior of other agents in the environment. The difficulty is further exacerbated when the agents are in an adversarial relationship, demanding that a robust (i.e. winning) non-stationary policy be rapidly learned and adapted. We propose an on-line sequence learning algorithm, ELPH, based on a straightforward entropy pruning technique that is able to rapidly learn and adapt to non-stationary policies. We demonstrate the performance of this method in a non-stationary learning environment of adversarial zero-sum matrix games.