Learning to cooperate via policy search

Authors:
Leonid Peshkin;Kee-Eung Kim;Nicolas Meuleau;Leslie Pack Kaelbling
Affiliations:
MIT AI Laboratory, Cambridge, MA;Computer Science Dept., Providence, RI;MIT AI Laboratory, Cambridge, MA;MIT AI Laboratory, Cambridge, MA
Venue:
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Year:
2000

Citing 13
Cited 8

Learning automata: an introduction

Learning automata: an introduction
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Memoryless policies: theoretical limitations and practical results

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning Policies with External Memory

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Distributed Value Functions

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
The Complexity of Decentralized Control of Markov Decision Processes

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Sequential optimality and coordination in multiagent systems

IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Coordinating teams in uncertain environments: a hybrid BDI-POMDP approach

ProMAS'04 Proceedings of the Second international conference on Programming Multi-Agent Systems
An optimal best-first search algorithm for solving infinite horizon DEC-POMDPs

ECML'05 Proceedings of the 16th European conference on Machine Learning
An overview of cooperative and competitive multiagent learning

LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems
Modeling cooperation in multi-agent communities

Cognitive Systems Research
Decentralised channel allocation and information sharing for teams of cooperative agents

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Continuous strategy replicator dynamics for multi-agent Q-learning

Autonomous Agents and Multi-Agent Systems
Teaching and leading an ad hoc teammate: Collaboration without pre-coordination

Artificial Intelligence
Multiagent meta-level control for radar coordination

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.