DCOPs and bandits: exploration and exploitation in decentralised coordination

Authors:
Ruben Stranders;Long Tran-Thanh;Francesco M. Delle Fave;Alex Rogers;Nicholas R. Jennings
Affiliations:
University of Southampton;University of Southampton;University of Southampton;University of Southampton;University of Southampton
Venue:
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Year:
2012

Citing 13
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
PAC Bounds for Multi-armed Bandit and Markov Decision Processes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms
A scalable method for multiagent constraint optimization

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Adopt: asynchronous distributed constraint optimization with quality guarantees

Artificial Intelligence - Special issue: Distributed constraint satisfaction
DCOPs meet the realworld: exploring unknown reward matrices with applications to mobile sensor networks

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
When should there be a "Me" in "Team"?: distributed multi-agent optimization under uncertainty

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Coordination for uncertain outcomes using distributed neighbor exchange

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Bounded approximate decentralised coordination via the max-sum algorithm

Artificial Intelligence
Multi-armed bandit algorithms and empirical evaluation

ECML'05 Proceedings of the 16th European conference on Machine Learning
The generalized distributive law

IEEE Transactions on Information Theory
Factor graphs and the sum-product algorithm

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB--DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB--DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the a priori unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon. We propose Heist, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. Heist solves MAB--DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that Heist outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB--DCOPs in experimental settings.