Emergent consensus in decentralised systems using collaborative reinforcement learning

  • Authors:
  • Jim Dowling;Raymond Cunningham;Anthony Harrington;Eoin Curran;Vinny Cahill

  • Affiliations:
  • Distributed Systems Group, Trinity College, Dublin, Ireland;Distributed Systems Group, Trinity College, Dublin, Ireland;Distributed Systems Group, Trinity College, Dublin, Ireland;Distributed Systems Group, Trinity College, Dublin, Ireland;Distributed Systems Group, Trinity College, Dublin, Ireland

  • Venue:
  • Self-star Properties in Complex Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the application of a decentralised coordination algorithm, called Collaborative Reinforcement Learning (CRL), to two different distributed system problems. CRL enables the establishment of consensus between independent agents to support the optimisation of system-wide properties in distributed systems where there is no support for global state. Consensus between interacting agents on local environmental or system properties is established through localised advertisement of policy information by agents and the use of advertisements by agents to update their local, partial view of the system. As CRL assumes homogeneity in advertisement evaluation by agents, advertisements that improve the system optimisation problem tend to be propagated quickly through the system, enabling the system to collectively adapt its behaviour to a changing environment. In this paper, we describe the application of CRL to two different distributed system problems, a routing protocol for ad-hoc networks called SAMPLE and a next generation urban traffic control system called UTC-CRL. We evaluate CRL experimentally in SAMPLE by comparing its system routing performance in the presence of changing environmental conditions, such as congestion and link unreliability, with existing ad-hoc routing protocols. Through SAMPLE's ability to establish consensus between routing agents on stable routes, even in the presence of changing levels of congestion in a network, it demonstrates improved performance and self-management properties. In applying CRL to the UTC scenario, we hope to validate experimentally the appropriateness of CRL to another system optimisation problem.