Coherent cooperation among communicating problem solvers
IEEE Transactions on Computers
Cooperative Mobile Robotics: Antecedents and Directions
Autonomous Robots
An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Sequential Optimality and Coordination in Multiagent Systems
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Coordinated Reinforcement Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Coordination in multiagent reinforcement learning: a Bayesian approach
AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Interpolation-based Q-learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical Reinforcement Learning in Communication-Mediated Multiagent Coordination
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Planning, learning and coordination in multiagent decision processes
TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
Coordinated learning in multiagent MDPs with infinite state-space
Autonomous Agents and Multi-Agent Systems
Online planning for ad hoc autonomous agent teams
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Hi-index | 0.00 |
In this paper we address the problem of coordination in multi-agent sequential decision problems with infinite statespaces. We adopt a game theoretic formalism to describe the interaction of the multiple decision-makers and propose the novel approximate biased adaptive play algorithm. This algorithm is an extension of biased adaptive play to team Markov games defined over infinite state-spaces. We establish our method to coordinate with probability 1 in the optimal strategy and discuss how this methodology can be combined with approximate learning architectures. We conclude with two simple examples of application of our algorithm.