A routing methodology for dynamic fault tolerance in meshes and tori

  • Authors:
  • Nils Agne Nordbotten;Tor Skeie

  • Affiliations:
  • Simula Research Laboratory, Lysaker, Norway;Simula Research Laboratory, Lysaker, Norway and Department of Informatics, University of Oslo, Norway

  • Venue:
  • HiPC'07 Proceedings of the 14th international conference on High performance computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a fully distributed fault-tolerant routing methodology for tori and meshes. A dynamic fault-model is supported, enabling the network to remain fully operational at all times. Contrary to most previous proposals that support a dynamic fault-model, the methodology is able to tolerate concave fault regions, thereby avoiding disabling healthy nodes in most practical scenarios. The methodology provides high network performance through the use of adaptive routing and provides graceful performance degradation in the presence of faults.