Efficient learning of multi-step best response

Authors:
Bikramjit Banerjee;Jing Peng
Affiliations:
Tulane University, New Orleans, LA;Tulane University, New Orleans, LA
Venue:
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Year:
2005

Citing 13
Cited 6

Efficient reinforcement learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Optimality and domination in repeated games with bounded players

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Multiagent learning using a variable learning rate

Artificial Intelligence
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Expected Mistake Bound Model for On-Line Reinforcement Learning

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Polynomial-time reinforcement learning of near-optimal policies

Eighteenth national conference on Artificial intelligence
Efficient algorithms for learning to play repeated games against computationally bounded adversaries

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
How to Explore your Opponent's Strategy (almost) Optimally

ICMAS '98 Proceedings of the 3rd International Conference on Multi Agent Systems
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Performance bounded reinforcement learning in strategic interactions

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning models of intelligent agents

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

On the performance of on-line concurrent reinforcement learners

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Online Multiagent Learning against Memory Bounded Adversaries

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Planning against fictitious players in repeated normal form games

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Cultivating desired behaviour: policy teaching via environment-dynamics tweaks

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
EA2: The Winning Strategy for the Inaugural Lemonade Stand Game Tournament

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Exploiting based pre-testing in competition environment

PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner's Dilemma game shows that even when no extra domain knowledge (besides that the opponent's memory size is known) is assumed, the error can still be small.