Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
The design and analysis of spatial data structures
The design and analysis of spatial data structures
The chaos router: a practical application of randomization in network routing
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-ary n-cubes
IEEE Transactions on Computers
Adaptive deadlock- and livelock-free routing with all minimal paths in Torus networks
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Failure detectors in omission failure environments
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Embedding and Reconfiguration of Spanning Trees in Faulty Hypercubes
IEEE Transactions on Parallel and Distributed Systems
High-Performance Routing in Networks of Workstations with Irregular Topology
IEEE Transactions on Parallel and Distributed Systems
On the Use of Virtual Channels in Networks of Workstations with Irregular Topology
IEEE Transactions on Parallel and Distributed Systems
A Protocol for Deadlock-Free Dynamic Reconfiguration in High-Speed Local Area Networks
IEEE Transactions on Parallel and Distributed Systems
ROC-1: Hardware Support for Recovery-Oriented Computing
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
ED4I: Error Detection by Diverse Data and Duplicated Instructions
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Distributed Algorithms
Dependable Network Computing
TNet: A Reliable System Area Network
IEEE Micro
IEEE Transactions on Parallel and Distributed Systems
Dynamic Reconfiguration in High-Speed Computer Clusters
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
Fault-tolerant adaptive routing for two-dimensional meshes
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A Flexible ServerNet-Based Fault-Tolerant Architecture
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Deadlock-Free Dynamic Reconfiguration Schemes for Increased Network Dependability
IEEE Transactions on Parallel and Distributed Systems
A distributed approach to handle topological changes in advanced switching
Proceedings of the 2nd ACM workshop on Performance monitoring and measurement of heterogeneous wireless and wired networks
A proposal for managing ASI fabrics
Journal of Systems Architecture: the EUROMICRO Journal
A new distributed management mechanism for ASI based networks
Computer Communications
RecTOR: A New and Efficient Method for Dynamic Network Reconfiguration
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Efficient network management applied to source routed networks
Parallel Computing
An abacus turn model for time/space-efficient reconfigurable routing
Proceedings of the 38th annual international symposium on Computer architecture
Adaptive inter-layer message routing in 3D networks-on-chip
Microprocessors & Microsystems
Enhancing Routing Robustness of Unstructured Peer-to-Peer Networks Using Mobile Agents
Journal of Network and Systems Management
Hi-index | 14.98 |
Component failures in high-speed computer networks can result in significant topological changes. In such cases, a network reconfiguration algorithm must be executed to restore the connectivity between the network nodes. Most contemporary networks use either static reconfiguration algorithms or stop the user traffic in order to prevent cyclic dependencies in the routing tables. The goal of this paper is to present NetRec, a dynamic network reconfiguration algorithm for tolerating multiple node and link failures in high-speed networks with arbitrary topology. The algorithm updates the routing tables asynchronously and does not require any global knowledge about the network topology. Certain phases of NetRec are executed in parallel, which reduces the reconfiguration time. The algorithm suspends the application traffic in small regions of the network only while the routing tables are being updated. The message complexity of NetRec is analyzed and the termination, liveness, and safety of the proposed algorithm are proven. Additionally, results from validation of the algorithm in a distributed network-validation testbed Distant, based on the MPI 1.2 features for building arbitrary virtual topologies, are presented.