Adaptive Fault-Tolerant Routing in Hypercube Multicomputers
IEEE Transactions on Computers
An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-ary n-cubes
IEEE Transactions on Computers
A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks
IEEE Transactions on Parallel and Distributed Systems
Routing in communications networks
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Fault-Tolerant Meshes with Small Degree
SIAM Journal on Computing
Embedding and Reconfiguration of Spanning Trees in Faulty Hypercubes
IEEE Transactions on Parallel and Distributed Systems
Fault tolerant networks with small degree
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Wormhole routing for torus networks with faults
Parallel Computing
Journal of Parallel and Distributed Computing
A new routing mechanism for networks with irregular topology
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Spider: A High-Speed Network Interconnect
IEEE Micro
The Alpha 21364 Network Architecture
IEEE Micro
Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks
IEEE Transactions on Computers
Communication in Multicomputers with Nonconvex Faults
IEEE Transactions on Computers
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Dynamic Reconfiguration in High-Speed Computer Clusters
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
A Low Cost Fault Tolerant Packet Routing for Parallel Computers
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Adaptive Bubble Router: A Design to Improve Performance in Torus Networks
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
The Double Scheme: Deadlock-free Dynamic Reconfiguration of Cut-Through Networks
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Fast Dynamic Reconfiguration in Irregular Networks
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
A Routing Methodology for Achieving Fault Tolerance in Direct Networks
IEEE Transactions on Computers
Reachability-Based Fault-Tolerant Routing
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Dense Gaussian networks: suitable topologies for on-chip multiprocessors
International Journal of Parallel Programming
Immucube: Scalable Fault-Tolerant Routing for k-ary n-cube Networks
IEEE Transactions on Parallel and Distributed Systems
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Dependability Analysis of a Fault-Tolerant Network Reconfiguring Strategy
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Vicis: a reliable network for unreliable silicon
Proceedings of the 46th Annual Design Automation Conference
Full-system simulation of distributed memory multicomputers
Cluster Computing
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A routing methodology for dynamic fault tolerance in meshes and tori
HiPC'07 Proceedings of the 14th international conference on High performance computing
A highly resilient routing algorithm for fault-tolerant NoCs
Proceedings of the Conference on Design, Automation and Test in Europe
DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips
Proceedings of the 48th Design Automation Conference
Online adaptive fault-tolerant routing in 2d torus
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
A systematic methodology to develop resilient cache coherence protocols
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A novel NoC-based design for fault-tolerance of last-level caches in CMPs
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
uDIREC: unified diagnosis and reconfiguration for frugal bypass of NoC faults
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
NoC-based fault-tolerant cache design in chip multiprocessors
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
A new and efficient mechanism to tolerate failures ininterconnection networks for parallel and distributedcomputers, denoted as Immunet, is presented in this work.In the presence of failures, Immunet automatically reactswith a hardware reconfiguration of the surviving networkresources. Immunet has four important advantages overprevious fault-tolerant switching mechanisms. Its lowhardware costs minimize the overhead that the networkmust support in absence of faults. As long as the networkremains connected, Immunet can tolerate any number offailures regardless of their spatial and temporalcombinations. The resulting communication infrastructureprovides optimized adaptive minimal routing over thesurviving topology. The system behavior under successivefailures exhibits graceful performance degradation.Immunet reconfiguration can be totally transparent tothe applications running on the parallel system as they willonly be affected by the loss of those data packetscirculating through the broken components. The rest of thepackets will suffer only a tolerable delay induced by thetime employed to perform the automatic networkreconfiguration. Descriptions of the hardware networkarchitecture and detailed synthetic and execution-drivensimulations will demonstrate the benefits of Immunet.