Fault-Tolerant Message Switching Based on Wormhole Switching and Backtracking

  • Authors:
  • Manabu Sueishi;Masato Kitakami;Hideo Ito

  • Affiliations:
  • -;-;-

  • Venue:
  • PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallel computers are now popularly applied to applications where many calculations are required. In a NO Remote memory Access model (NORA) parallel computer, many processors are connected by communication links and calculation results are obtained by communications among processors. The message switching method, which controls message transmission in the parallel computer, is one of the most important parameters to improve the performance of the parallel computer. Since parallel computers include many processors, its failure rate is very high and many fault-tolerant switching methods have been proposed. Theexisting methods have problems, however, such as low communication throughput, low fault-tolerant capability, and large hardware overhead.This paper proposes fault-tolerant switching by improving wormhole switching. The proposed method inserts dummy flits, having no information, after the header flit, the first flit of the packet. By overwriting the header flit to the dummy flit, backtracking is implemented without hardware overhead. Computer simulation says that in a 16 by 16 2D torus, for example, the throughput of the proposed method is almost equal to that of existing methods which require large hardware overhead if the number of the faulty nodes is less then 40.