A lightweight idempotent messaging protocol for faulty networks

  • Authors:
  • Jeremy Brown;J. P. Grossman;Tom Knight

  • Affiliations:
  • M.I.T., Cambridge, MA;M.I.T., Cambridge, MA;M.I.T., Cambridge, MA

  • Venue:
  • Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

As parallel machines scale to one million nodes and beyond, it becomes increasingly difficult to build a reliable network that is able to guarantee packet delivery. Eventually large systems will need to employ fault-tolerant messaging protocols that afford correct execution in the presence of a lossy network. In this paper we present a lightweight protocol that preserves message idempotence and is easy to implement in hardware. We identify the requirements for a correct implementation of the protocol. Experiments are performed in simulation to determine implementation parameters that optimize performance. We find that an aggressive implementation on a fat tree network results in a slowdown of less than 2x compared to buffered wormhole routing on a fault-free network.