Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Introduction to algorithms
Planar-adaptive routing: low-cost adaptive networks for multiprocessors
Journal of the ACM (JACM)
Approximation algorithms for NP-hard problems
Approximation algorithms for NP-hard problems
An Improved Algorithm for Fault-Tolerant Wormhole Routing in Meshes
IEEE Transactions on Computers
A Fault-Tolerant Routing Scheme for Meshes with Nonconvex Faults
IEEE Transactions on Parallel and Distributed Systems
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer
ICS '01 Proceedings of the 15th international conference on Supercomputing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks
IEEE Transactions on Computers
Communication in Multicomputers with Nonconvex Faults
IEEE Transactions on Computers
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
A Routing Methodology for Achieving Fault Tolerance in Direct Networks
IEEE Transactions on Computers
Fault-tolerant wormhole routing with 2 virtual channels in meshes
Journal of Computer Science and Technology
Reachability-Based Fault-Tolerant Routing
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Immucube: Scalable Fault-Tolerant Routing for k-ary n-cube Networks
IEEE Transactions on Parallel and Distributed Systems
A unified fault-tolerant routing scheme for a class of cluster networks
Journal of Systems Architecture: the EUROMICRO Journal
An Adaptive and Fault-Tolerant Routing Algorithm for Meshes
ICCSA '08 Proceeding sof the international conference on Computational Science and Its Applications, Part I
Dependability Analysis of a Fault-Tolerant Network Reconfiguring Strategy
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A Multipath Fault-Tolerant Routing Method for High-Speed Interconnection Networks
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Vicis: a reliable network for unreliable silicon
Proceedings of the 46th Annual Design Automation Conference
A fault-tolerant communication scheme for regular cluster networks
CIIT '07 The Sixth IASTED International Conference on Communications, Internet, and Information Technology
A highly resilient routing algorithm for fault-tolerant NoCs
Proceedings of the Conference on Design, Automation and Test in Europe
A resilient on-chip router design through data path salvaging
Proceedings of the 16th Asia and South Pacific Design Automation Conference
A distributed and topology-agnostic approach for on-line NoC testing
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Adaptive inter-layer message routing in 3D networks-on-chip
Microprocessors & Microsystems
Fault-tolerant wormhole routing algorithm in 2D meshes without virtual channels
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Topology Agnostic Dynamic Quick Reconfiguration for Large-Scale Interconnection Networks
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Improving a fault-tolerant routing algorithm using detailed traffic analysis
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
A fast algorithm for runtime reconfiguration to maximize the lifetime of nanoscale NoCs
Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip
Hi-index | 14.98 |
A new method for fault-tolerant wormhole routing in arbitrary dimensional meshes is introduced. The method was motivated by certain routing requirements of an initial design of the Blue Gene supercomputer at IBM Research. The machine is organized as a three-dimensional mesh containing many thousands of nodes and the routing method should tolerate a few percent of the nodes being faulty. There has been much work on routing methods for meshes that route messages around faults or regions of faults. The new method is to declare certain nonfaulty nodes to be "lambs.驴 A lamb is used for routing but not processing, so a lamb is neither the source nor the destination of a message. The lambs are chosen so that every "survivor node,驴 a node that is neither faulty nor a lamb, can reach every survivor node by at most two rounds of dimension-ordered (such as e{\hbox{-}}{\rm cube}) routing. An algorithm for finding a set of lambs is presented. The results of simulations on 2D and 3D meshes of various sizes with various numbers of random node faults are given. For example, on a 32 \times 32 \times 32 3D mesh with 3 percent random faults and using at most two rounds of e{\hbox{-}}{\rm cube} routing for each message, the average number of lambs is less than 68, which is less than 7 percent of the number 983 of faults and less than 0.21 percent of the number 32,768 of nodes.