Fault-Tolerant Processor Arrays Using Additional Bypass Linking Allocated by Graph-Node Coloring

  • Authors:
  • Nobuo Tsuda

  • Affiliations:
  • Kanazawa Institute of Technology, Ishikawa, Japan

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2000

Quantified Score

Hi-index 14.98

Visualization

Abstract

An advanced spare-connection scheme for k-out-of-n redundancy called 驴generalized additional bypass linking驴 is proposed for constructing fault-tolerant massively parallel computers with series-connected, mesh-connected, or tree-connected processing element (PE) arrays. This scheme uses bypass links with wired OR connections to selectively connect the primary PEs to a spare PE in parallel. These bypass links are allocated to the primary PEs by node-coloring of a graph with a minimum inter-node distance of three in order to minimize the number of bypass links (i.e., the chromatic number). The main advantage of this scheme is that it can be used for constructing various k-out-of-n configurations capable of enhanced PE-to-PE communication and broadcast while still achieving strong fault tolerance for these PEs and links. In particular, it enables the construction of optimal r-strongly-fault-tolerant configurations capable of direct k-out-of-n selections by providing r spare PEs and $r$ extra connections per PE for any kind of array when node-coloring with a distance of three is used. This simple spare-circuit structure enhances fault tolerance more than conventional schemes do. The node-coloring patterns were constructed using new node-coloring algorithms and the chromatic numbers were evaluated theoretically. Enhanced PE-to-PE communication and broadcast were achieved by using new fault-tolerant routing algorithms based on the properties of the node-coloring patterns with four or five message transmission steps being optimal configurations with any size array.