Online implicit agent modelling

Authors:
Nolan Bard;Michael Johanson;Neil Burch;Michael Bowling
Affiliations:
University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada
Venue:
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Year:
2013

Citing 9
Cited 0

Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Strategy evaluation in extensive games with importance sampling

Proceedings of the 25th international conference on Machine learning
Abstraction pathologies in extensive games

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Effective short-term opponent exploitation in simplified poker

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold'em poker

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Game theory-based opponent modeling in large imperfect-information games

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Safe opponent exploitation

Proceedings of the 13th ACM Conference on Electronic Commerce
On combining decisions from multiple expert imitators for performance

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

Quantified Score

Hi-index	0.00

Visualization

Abstract

The traditional view of agent modelling is to infer the explicit parameters of another agent's strategy (i.e., their probability of taking each action in each situation). Unfortunately, in complex domains with high dimensional strategy spaces, modelling every parameter often requires a prohibitive number of observations. Furthermore, given a model of such a strategy, computing a response strategy that is robust to modelling error may be impractical to compute online. Instead, we propose an implicit modelling framework where agents aim to estimate the utility of a fixed portfolio of pre-computed strategies. Using the domain of heads-up limit Texas hold'em poker, this work describes an end-to-end approach for building an implicit modelling agent. We compute robust response strategies, show how to select strategies for the portfolio, and apply existing variance reduction and online learning techniques to dynamically adapt the agent's strategy to its opponent. We validate the approach by showing that our implicit modelling agent would have won the heads-up limit opponent exploitation event in the 2011 Annual Computer Poker Competition.