Efficient reinforcement learning
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Optimality and domination in repeated games with bounded players
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Multiagent learning using a variable learning rate
Artificial Intelligence
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Expected Mistake Bound Model for On-Line Reinforcement Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Polynomial-time reinforcement learning of near-optimal policies
Eighteenth national conference on Artificial intelligence
Efficient algorithms for learning to play repeated games against computationally bounded adversaries
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
How to Explore your Opponent's Strategy (almost) Optimally
ICMAS '98 Proceedings of the 3rd International Conference on Multi Agent Systems
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Performance bounded reinforcement learning in strategic interactions
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning models of intelligent agents
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
On the performance of on-line concurrent reinforcement learners
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Online Multiagent Learning against Memory Bounded Adversaries
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Planning against fictitious players in repeated normal form games
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Cultivating desired behaviour: policy teaching via environment-dynamics tweaks
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
EA2: The Winning Strategy for the Inaugural Lemonade Stand Game Tournament
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Exploiting based pre-testing in competition environment
PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems
Hi-index | 0.00 |
We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner's Dilemma game shows that even when no extra domain knowledge (besides that the opponent's memory size is known) is assumed, the error can still be small.