Average reward reinforcement learning: foundations, algorithms, and empirical results
Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Towards Flexible Teamwork in Persistent Teams: Extended Report
Autonomous Agents and Multi-Agent Systems
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Online Multiagent Learning against Memory Bounded Adversaries
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Coordination and adaptation in impromptu teams
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 1
Efficient structure learning in factored-state MDPs
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Robust agent teams via socially-attentive monitoring
Journal of Artificial Intelligence Research
Learning against opponents with bounded memory
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
To teach or not to teach?: decision making under uncertainty in ad hoc teams
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Empirical evaluation of ad hoc teamwork in the pursuit domain
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Online planning for ad hoc autonomous agent teams
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Leading ad hoc agents in joint action settings with multiple teammates
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Hi-index | 0.00 |
This paper focuses on learning in the presence of a Markovian teammate in Ad hoc teams. A Markovian teammate's policy is a function of a set of discrete feature values derived from the joint history of interaction, where the feature values transition in a Markovian fashion on each time step. We introduce a novel algorithm "Learning to Cooperate with a Markovian teammate", or LCM, that converges to optimal cooperation with any Markovian teammate, and achieves safety with any arbitrary teammate. The novel aspect of LCM is the manner in which it satisfies the above two goals via efficient exploration and exploitation. The main contribution of this paper is a full specification and a detailed analysis of LCM's theoretical properties.1