Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Strategy evaluation in extensive games with importance sampling
Proceedings of the 25th international conference on Machine learning
Abstraction pathologies in extensive games
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Effective short-term opponent exploitation in simplified poker
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Game theory-based opponent modeling in large imperfect-information games
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Proceedings of the 13th ACM Conference on Electronic Commerce
On combining decisions from multiple expert imitators for performance
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Hi-index | 0.00 |
The traditional view of agent modelling is to infer the explicit parameters of another agent's strategy (i.e., their probability of taking each action in each situation). Unfortunately, in complex domains with high dimensional strategy spaces, modelling every parameter often requires a prohibitive number of observations. Furthermore, given a model of such a strategy, computing a response strategy that is robust to modelling error may be impractical to compute online. Instead, we propose an implicit modelling framework where agents aim to estimate the utility of a fixed portfolio of pre-computed strategies. Using the domain of heads-up limit Texas hold'em poker, this work describes an end-to-end approach for building an implicit modelling agent. We compute robust response strategies, show how to select strategies for the portfolio, and apply existing variance reduction and online learning techniques to dynamically adapt the agent's strategy to its opponent. We validate the approach by showing that our implicit modelling agent would have won the heads-up limit opponent exploitation event in the 2011 Annual Computer Poker Competition.