Performance analysis and fault tolerance of randomized routing on Clos networks

  • Authors:
  • M. Bhatia;A Youssef

  • Affiliations:
  • -;-

  • Venue:
  • FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Beside universality and very low latency, Youssef's randomized self-routing algorithms (1993) have high tolerance for multiple faults and more strikingly have the potential for fault tolerance without diagnosis. In this paper we study the performance of Youssef's routing algorithms for faulty Clos networks in the presence of multiple faults in multiple columns with and without fault detection. We show that with fault detection and diagnosis, randomized routing algorithms provide scalable, very efficient and fault tolerant routing mechanisms. Without fault detection and diagnosis, randomized routing provides good fault tolerance for faulty switches in either the first or the second column. The delays become large for faults in the third column or for faults in more than one column. In conclusion, randomized routing enables the system to run without periodic fault detection/diagnosis, and if and when the performance degrades beyond a certain threshold, diagnosis can be performed to improve the routing performance.