Coordinated learning in multiagent MDPs with infinite state-space

Authors:
Francisco S. Melo;M. Isabel Ribeiro
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, USA 15213;Institute for Systems and Robotics, Instituto Superior Técnico, Lisbon, Portugal 1049-001
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
2010

Citing 36
Cited 3

Coherent cooperation among communicating problem solvers

IEEE Transactions on Computers
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Temporal difference learning and TD-Gammon

Communications of the ACM
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The asymptotic convergence-rate of Q-learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Learning in multiagent systems

Multiagent systems
Elevator Group Control Using Multiple Reinforcement Learning Agents

Machine Learning
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Multiagent learning using a variable learning rate

Artificial Intelligence
Random Iterative Models

Random Iterative Models
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Reinforcement Learning for Call Admission Control and Routing under Quality of Service Constraints in Multimedia Networks

Machine Learning
Kernel-Based Reinforcement Learning

Machine Learning
The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Convergence Problems of General-Sum Multiagent Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Coordinated Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
Learning Rates for Q-learning

The Journal of Machine Learning Research
Interpolation-based Q-learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Planning, learning and coordination in multiagent decision processes

TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms

Neural Computation
An analysis of reinforcement learning with function approximation

Proceedings of the 25th international conference on Machine learning
Emerging coordination in infinite team Markov games

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Markov Chains and Stochastic Stability

Markov Chains and Stochastic Stability
Learning of coordination: exploiting sparse interactions in multiagent systems

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
A framework for sequential planning in multi-agent settings

Journal of Artificial Intelligence Research
Sequential optimality and coordination in multiagent systems

IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
Rational and convergent learning in stochastic games

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Some studies in machine learning using the game of checkers. II: recent progress

IBM Journal of Research and Development
Some studies in machine learning using the game of checkers

IBM Journal of Research and Development
Value-function reinforcement learning in Markov games

Cognitive Systems Research

Social conformity and its convergence for reinforcement learning

MATES'10 Proceedings of the 8th German conference on Multiagent system technologies
Social welfare for automatic innovation

MATES'11 Proceedings of the 9th German conference on Multiagent system technologies
Adaptive edge detection with distributed behaviour-based agents in WSNs

International Journal of Sensor Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the problem of simultaneous learning and coordination in multiagent Markov decision problems (MMDPs) with infinite state-spaces. We separate this problem in two distinct subproblems: learning and coordination. To tackle the problem of learning, we survey Q-learning with soft-state aggregation (Q-SSA), a well-known method from the reinforcement learning literature (Singh et al. in Advances in neural information processing systems. MIT Press, Cambridge, vol 7, pp 361---368, 1994). Q-SSA allows the agents in the game to approximate the optimal Q-function, from which the optimal policies can be computed. We establish the convergence of Q-SSA and introduce a new result describing the rate of convergence of this method. In tackling the problem of coordination, we start by pointing out that the knowledge of the optimal Q-function is not enough to ensure that all agents adopt a jointly optimal policy. We propose a novel coordination mechanism that, given the knowledge of the optimal Q-function for an MMDP, ensures that all agents converge to a jointly optimal policy in every relevant state of the game. This coordination mechanism, approximate biased adaptive play (ABAP), extends biased adaptive play (Wang and Sandholm in Advances in neural information processing systems. MIT Press, Cambridge, vol 15, pp 1571---1578, 2003) to MMDPs with infinite state-spaces. Finally, we combine Q-SSA with ABAP, this leading to a novel algorithm in which learning of the game and coordination take place simultaneously. We discuss several important properties of this new algorithm and establish its convergence with probability 1. We also provide simple illustrative examples of application.