The turn model for adaptive routing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Fault-tolerant wormhole routing in tori
ICS '94 Proceedings of the 8th international conference on Supercomputing
A Protocol for Deadlock-Free Dynamic Reconfiguration in High-Speed Local Area Networks
IEEE Transactions on Parallel and Distributed Systems
Communication in Multicomputers with Nonconvex Faults
IEEE Transactions on Computers
A Thory of Fault-Tolerant routing in Wormhole Networks
Proceedings of the 1994 International Conference on Parallel and Distributed Systems
Fast Dynamic Reconfiguration in Irregular Networks
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Deadlock-Free Dynamic Reconfiguration Schemes for Increased Network Dependability
IEEE Transactions on Parallel and Distributed Systems
A New Approach to Fault-Tolerant Wormhole Routing for Mesh-Connected Parallel Computers
IEEE Transactions on Computers
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Part I: A Theory for Deadlock-Free Dynamic Network Reconfiguration
IEEE Transactions on Parallel and Distributed Systems
Part II: A Methodology for Developing Deadlock-Free Dynamic Network Reconfiguration Processes
IEEE Transactions on Parallel and Distributed Systems
Layered Routing in Irregular Networks
IEEE Transactions on Parallel and Distributed Systems
A Routing Methodology for Achieving Fault Tolerance in Direct Networks
IEEE Transactions on Computers
RecTOR: A New and Efficient Method for Dynamic Network Reconfiguration
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Optimized Routing for Large-Scale InfiniBand Networks
HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
A routing methodology for dynamic fault tolerance in meshes and tori
HiPC'07 Proceedings of the 14th international conference on High performance computing
Simple deadlock-free dynamic network reconfiguration
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Hi-index | 0.00 |
Toleration of faults in the interconnection networks is of vital importance in to days huge computer installations. Still, the existing solutions are short of being satisfactory. They require that the system defaults into a routing algorithm that is inferior to the original, either in terms of performance, or in terms of the need for virtual channels, or both. Furthermore, since support for dynamic reconfiguration is not supported in current hardware, existing methods require the system to be halted while reconfiguration takes place in order to avoid deadlocks. In this paper we present a method that efficiently generates a new routing function in the presence of faults. The new routing function only reroutes the traffic that is affected by the fault, so that the performance of the original routing function is preserved to the extent possible. No specific functionality in the switches is required, we only require exactly the same number of virtual channels in the presence of faults as the original routing algorithm did. Finally, the new routing function is compatible with the old one, so that deadlock free dynamic transition between the old and the new routing function is immediately available. This means that our solution can easily be implemented on current InfiniBand platforms, e.g. through the OFED software stack. We demonstrate that the method is workable for meshes, tori and fat-trees, and that it is able to guarantee one-fault tolerance for all of these topologies.