Coherent cooperation among communicating problem solvers
IEEE Transactions on Computers
TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Temporal difference learning and TD-Gammon
Communications of the ACM
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The asymptotic convergence-rate of Q-learning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Learning in multiagent systems
Multiagent systems
Elevator Group Control Using Multiple Reinforcement Learning Agents
Machine Learning
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Multiagent learning using a variable learning rate
Artificial Intelligence
Random Iterative Models
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Kernel-Based Reinforcement Learning
Machine Learning
The Complexity of Decentralized Control of Markov Decision Processes
Mathematics of Operations Research
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Convergence Problems of General-Sum Multiagent Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Coordinated Reinforcement Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Stable Function Approximation in Dynamic Programming
Stable Function Approximation in Dynamic Programming
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Interpolation-based Q-learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Planning, learning and coordination in multiagent decision processes
TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms
Neural Computation
An analysis of reinforcement learning with function approximation
Proceedings of the 25th international conference on Machine learning
Emerging coordination in infinite team Markov games
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Markov Chains and Stochastic Stability
Markov Chains and Stochastic Stability
Learning of coordination: exploiting sparse interactions in multiagent systems
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
A framework for sequential planning in multi-agent settings
Journal of Artificial Intelligence Research
Sequential optimality and coordination in multiagent systems
IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
Rational and convergent learning in stochastic games
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Some studies in machine learning using the game of checkers. II: recent progress
IBM Journal of Research and Development
Some studies in machine learning using the game of checkers
IBM Journal of Research and Development
Value-function reinforcement learning in Markov games
Cognitive Systems Research
Social conformity and its convergence for reinforcement learning
MATES'10 Proceedings of the 8th German conference on Multiagent system technologies
Social welfare for automatic innovation
MATES'11 Proceedings of the 9th German conference on Multiagent system technologies
Adaptive edge detection with distributed behaviour-based agents in WSNs
International Journal of Sensor Networks
Hi-index | 0.00 |
In this paper we address the problem of simultaneous learning and coordination in multiagent Markov decision problems (MMDPs) with infinite state-spaces. We separate this problem in two distinct subproblems: learning and coordination. To tackle the problem of learning, we survey Q-learning with soft-state aggregation (Q-SSA), a well-known method from the reinforcement learning literature (Singh et al. in Advances in neural information processing systems. MIT Press, Cambridge, vol 7, pp 361---368, 1994). Q-SSA allows the agents in the game to approximate the optimal Q-function, from which the optimal policies can be computed. We establish the convergence of Q-SSA and introduce a new result describing the rate of convergence of this method. In tackling the problem of coordination, we start by pointing out that the knowledge of the optimal Q-function is not enough to ensure that all agents adopt a jointly optimal policy. We propose a novel coordination mechanism that, given the knowledge of the optimal Q-function for an MMDP, ensures that all agents converge to a jointly optimal policy in every relevant state of the game. This coordination mechanism, approximate biased adaptive play (ABAP), extends biased adaptive play (Wang and Sandholm in Advances in neural information processing systems. MIT Press, Cambridge, vol 15, pp 1571---1578, 2003) to MMDPs with infinite state-spaces. Finally, we combine Q-SSA with ABAP, this leading to a novel algorithm in which learning of the game and coordination take place simultaneously. We discuss several important properties of this new algorithm and establish its convergence with probability 1. We also provide simple illustrative examples of application.