A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks
IEEE Transactions on Parallel and Distributed Systems
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks
IEEE Transactions on Computers
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
Modeling and Simulation of a Network of Workstations with Wormhole Switching
SS '00 Proceedings of the 33rd Annual Simulation Symposium
Improving the Efficiency of Adaptive Routing in Networks with Irregular Topology
HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Hi-index | 0.00 |
Networks of workstations (NOWs) are becoming an increasingly popular alternative to parallel computers for those applications with high needs of resources such as memory capacity and input/output storage space, and also for small scale parallel computing. Although the mean time between failures (MTBF) for individual links and switches in a NOW is very high, the probability of a failure occurrence dramatically increases as the network size becomes larger. Moreover, there are external factors, such as accidental link disconnections, that also can affect the overall NOW reliability. Until the faulty element is replaced, the NOW is functioning in a degraded mode. Thus, it becomes necessary to quantify how much the global NOW performance is reduced during the time the system remains in this state. In this paper we analyze the performance degradation of networks of workstations when failures in links or switches occur. Because the routing algorithm is a key issue in the design of a NOW, we quantify the sensitivity to failures of two routing algorithms: up*/down* and minimal adaptive routing algorithms. Simulation results show that, in general, up*/down* routing is highly robust to failures. On the other hand, the minimal adaptive routing algorithm presents a better performance, even in the presence of failures, but at the expense of a larger sensitivity.