QUICR-learning for multi-agent coordination

Authors:
Adrian K. Agogino;Kagan Tumer
Affiliations:
UCSC, NASA Ames Research Center, Moffett Field, CA;NASA Ames Research Center, Moffett Field, CA
Venue:
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Year:
2006

Citing 10
Cited 3

Technical Note: \cal Q-Learning

Machine Learning
Learning sequences of actions in collectives of autonomous agents

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Coordinated Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Coordination in multiagent reinforcement learning: a Bayesian approach

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Unifying Temporal and Structural Credit Assignment Problems

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Taming decentralized POMDPs: towards efficient policy computation for multiagent settings

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
Towards incremental social learning in optimization and multiagent systems

Proceedings of the 10th annual conference companion on Genetic and evolutionary computation
Distributed Reinforcement Learning for Coordinate Multi-Robot Foraging

Journal of Intelligent and Robotic Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t′ t. Second, credit must be assigned for the contribution of agent i to the overall system performance. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning. The second credit assignment problem is typically addressed by creating custom reward functions. To address both credit assignment problems simultaneously, we propose the "Q Updates with Immediate Counterfactual Rewards-learning" (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. QUICR-learning is based on previous work on single-time-step counterfactual rewards described by the collectives framework. Results on a traffic congestion problem shows that QUICR-learning is significantly better than a Q-learner using collectives-based (single-time-step counterfactual) rewards. In addition QUICR-learning provides significant gains over conventional and local Q-learning. Additional results on a multi-agent grid-world problem show that the improvements due to QUICR-learning are not domain specific and can provide up to a ten fold increase in performance over existing methods.