Adaptive Fault-Tolerant Deadlock-Free Routing in Meshes and Hypercubes

  • Authors:
  • Chien-Chun Su;Kang G. Shin

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1996

Quantified Score

Hi-index 14.99

Visualization

Abstract

We present an adaptive deadlock-free routing algorithm which decomposes a given network into two virtual interconnection networks, VIN1 and VIN2. VIN1 supports deterministic deadlock-free routing, and VIN2 supports fully-adaptive routing. Whenever a channel in VIN1 or VIN2 is available, it can be used to route a message.Each node is identified to be in one of three states: safe, unsafe, and faulty. The unsafe state is used for deadlock-free routing, and an unsafe node can still send and receive messages. When nodes become faulty/unsafe, some channels in VIN2 around the faulty/unsafe nodes are used as the detours of those channels in VIN1 passing through the faulty/unsafe nodes, i.e., the adaptability in VIN2 is transformed to support fault-tolerant deadlock-free routing. Using information on the state of each node's neighbors, we have developed an adaptive fault-tolerant deadlock-free routing scheme for n-dimensional meshes and hypercubes with only two virtual channels per physical link.In an n-dimensional hypercube, any pattern of faulty nodes can be tolerated as long as the number of faulty nodes is no more than $\lceil\, n/2 \,\rceil$. The maximum number of faulty nodes that can be tolerated is 2n驴1, which occurs when all faulty nodes can be encompassed in an (n驴 1)-cube. In an n-dimensional mesh, we use a more general fault model, called a disconnected rectangular block. Any arbitrary pattern of faulty nodes can be modeled as a rectangular block after finding both unsafe and disabled nodes (which are then treated as faulty nodes). This concept can also be applied to k-ary n-cubes with four virtual channels, two in VIN1 and the other two in VIN2. Finally, we present simulation results for both hypercubes and 2-dimensional meshes by using various workloads and fault patterns.