Routing in Modular Fault-Tolerant Multiprocessor Systems

  • Authors:
  • M. Sultan Alam;Rami G. Melhem

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1995

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, we consider a class of modular multiprocessor architectures in which spares are added to each module to cover for faulty nodes within that module, thus forming a fault-tolerant basic block (FTBB). In contrast to reconfiguration techniques that preserve the physical adjacency between active nodes in the system, our goal is to preserve the logical adjacency between active nodes by means of a routing algorithm which delivers messages successfully to their destinations. We introduce two-phase routing strategies that route messages first to their destination FTBB, and then to the destination nodes within the destination FTBB. Such a strategy may be applied to a variety of architectures including binary hypercubes and three-dimensional tori. In the presence of f faults in hypercubes and tori, we show that the worst case length of the message route is min {驴+f, (K+ 1)驴}+c where 驴 is the shortest path in the absence of faults, K is the number of spare nodes in an FTBB, and c is a small constant. The average routing overhead is much lower than the worst case overhead.