An efficient fault-tolerant routing methodology for fat-tree interconnection networks

  • Authors:
  • Crispín Gómez;María E. Gómez;Pedro López;José Duato

  • Affiliations:
  • Dept. of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Dept. of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Dept. of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Dept. of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain

  • Venue:
  • ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In large cluster-based machines, fault-tolerance in the interconnection network is an issue of growing importance, since their increasing size rises the probability of failure. The topology used in these machines is usually a fat-tree. This paper proposes a new distributed fault-tolerant routing methodology for fattrees. It does not require additional network hardware. It is scalable, since the required memory, switch hardware and routing delay do not depend on the network size. The methodology is based on enhancing the Interval Routing scheme with exclusion intervals. Exclusion intervals are associated to each switch output port, and represent the set of nodes that are unreachable from this port after a failure appears. We propose a mechanism to identify the exclusion intervals that must be updated after detecting a failure, and the values to write on them. Our methodology is able to support a relatively high number of network failures with a low degradation in network performance.