Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Fault-Tolerant Routing in Multistage Interconnection Networks
IEEE Transactions on Computers
Fault-tolerant routing in MIN-based supercomputers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Design and performance of multipath MIN architectures
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
On performance evaluation of fault tolerant multistage interconnection networks
SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
On the Fault Tolerance of Some Popular Bounded-Degree Networks
SIAM Journal on Computing
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Design of a Fault Tolerant Multistage Interconnection Network with Parallel Duplicated Switches
DFT '00 Proceedings of the 15th IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems
Efficient fault tolerant routings in networks
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Supporting adaptive routing in IBA switches
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Evolutions in parallel distributed and network-based processing
A Memory-Effective Routing Strategy for Regular Interconnection Networks
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Siamese-Twin: A Dynamically Fault-Tolerant Fat-Tree
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The selective extra stage butterfly
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A Multipath Fault-Tolerant Routing Method for High-Speed Interconnection Networks
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
The Journal of Supercomputing
Hi-index | 0.00 |
In large cluster-based machines, fault-tolerance in the interconnection network is an issue of growing importance, since their increasing size rises the probability of failure. The topology used in these machines is usually a fat-tree. This paper proposes a new distributed fault-tolerant routing methodology for fattrees. It does not require additional network hardware. It is scalable, since the required memory, switch hardware and routing delay do not depend on the network size. The methodology is based on enhancing the Interval Routing scheme with exclusion intervals. Exclusion intervals are associated to each switch output port, and represent the set of nodes that are unreachable from this port after a failure appears. We propose a mechanism to identify the exclusion intervals that must be updated after detecting a failure, and the values to write on them. Our methodology is able to support a relatively high number of network failures with a low degradation in network performance.