Learning sequences of actions in collectives of autonomous agents

Authors:
Kagan Tumer;Adrian K. Agogino;David H. Wolpert
Affiliations:
NASA Ames Research Center, Moffett Field, CA;The University of Texas, Austin, TX;NASA Ames Research Center, Moffett Field, CA
Venue:
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
Year:
2002

Citing 9
Cited 11

Technical Note: \cal Q-Learning

Machine Learning
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Adaptivity in agent-based routing for data networks

AGENTS '00 Proceedings of the fourth international conference on Autonomous agents
Using collective intelligence to route Internet traffic

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiagent Systems: A Survey from a Machine Learning Perspective

Autonomous Robots
A Roadmap of Agent Research and Development

Autonomous Agents and Multi-Agent Systems
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Collective Intelligence and Braess' Paradox

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence

Team formation and communication restrictions in collectives

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Collectives for multiple resource job scheduling across heterogeneous servers

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Unifying Temporal and Structural Credit Assignment Problems

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Multi-Agent Patrolling with Reinforcement Learning

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Multi-agent reward analysis for learning in noisy domains

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Cooperative Multi-Agent Learning: The State of the Art

Autonomous Agents and Multi-Agent Systems
Handling Communication Restrictions and Team Formation in Congestion Games

Autonomous Agents and Multi-Agent Systems
QUICR-learning for multi-agent coordination

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Collective intelligence, data routing and braess' paradox

Journal of Artificial Intelligence Research
Solving transition independent decentralized Markov decision processes

Journal of Artificial Intelligence Research
Coordinating learning agents for multiple resource job scheduling

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. Directly applying Reinforcement Learning (RL) concepts to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in designing multi-agent systems focuses on how to set the rewards for the RL algorithm of each agent so that as the agents attempt to maximize those rewards, the system reaches a globally "desirable" solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence [15,23] to design rewards for the agents that are "aligned" with the global reward, and are "learnable" in that agents can readily see how their behavior affects their reward. We show that reinforcement learning agents using those rewards outperform both "natural" extensions of single agent algorithms and global reinforcement learning solutions based on "team games".