Fault-Tolerant Communication with Partitioned Dimension-Order Routers

  • Authors:
  • Rajendra V. Boppana;Suresh Chalasani

  • Affiliations:
  • Univ. of Texas at San Antonio, San Antonio;TM Floyd & Company, Rosemont, IL

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The current fault-tolerant routing methods require extensive changes to practical routers such as the Cray T3D's dimension-order router to handle faults. In this paper, we propose methods to handle faults in multicomputers with dimension-order routers with simple changes to router structure and logic. Our techniques can be applied to current implementations in which the router is partitioned into multiple modules and no centralized crossbar is used. We consider arbitrarily located faulty blocks and assume only local knowledge of faults. We apply our techniques for torus networks and show that, with as few as four virtual channels per physical channel, deadlock- and livelock-free routing can be provided even with multiple faults and multimodule implementation of routers. Our simulations of the proposed technique for 2D tori and mesh indicate that the performance degradation is similar to that seen in the case of cross-bar based designs previously proposed.