IEEE Transactions on Computers
SIGCOMM '92 Conference proceedings on Communications architectures & protocols
Bounds on the performance of dynamic routing schemes for highly connected networks
Mathematics of Operations Research
Optimizing the system of virtual paths
IEEE/ACM Transactions on Networking (TON)
Efficient solutions to multicast routing in communication networks
Mobile Networks and Applications - Special issue: routing in mobile communications networks
Dynamic Routing System (DRS): fault tolerance in network routing
IC3N '97 Selected papers of the 6th international conference on Computer communications and networks
Analysis of multi-path routing
IEEE/ACM Transactions on Networking (TON)
Analysis of rerouting in circuit-switched networks
IEEE/ACM Transactions on Networking (TON)
Dynamic Routing in Telecommunications Networks
Dynamic Routing in Telecommunications Networks
Adaptive load migration systems for PVM
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A Case for NOW (Networks of Workstations)
IEEE Micro
Performance of interconnection rip-up and reroute strategies
DAC '81 Proceedings of the 18th Design Automation Conference
A Design Study of Alternative Network Topologies for the Beowulf Parallel Workstation
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Improved survivability with multi-layer dynamic routing
IEEE Communications Magazine
Hi-index | 0.00 |
With the ever-increasing demands on server applications, reliability is of paramount importance. Often these services are implemented using a distributed server cluster architecture where many servers act together providing end user services. We evaluated one hundred deployed systems and found that over a one-year period, thirteen percent of the hardware failures were network related. To reliably provide end-user services, the server clusters must guarantee server-to-server communication in the presence of these network failures. We describe a protocol designed to provide proactive dynamic routing for server clusters architectures called the Dynamic Routing System (DRS) protocol and present analysis to its survivability in the presence of network failure. Our experiments show that, for an eight-node server cluster with three concurrent network failures, the DRS provides a 267% improvement in the probability of server to server communication over a traditional network topology. Additionally, the proactive routing approach of the DRS performs better than traditional routing systems by fixing network problems before they affect application communication.