Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems
IEEE Transactions on Computers
Queueing networks and Markov chains: modeling and performance evaluation with computer science applications
A New Reliability Model for Interconnection Networks
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Low Cost Fault Tolerant Packet Routing for Parallel Computers
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model
IEEE Transactions on Computers
A New Approach to Fault-Tolerant Wormhole Routing for Mesh-Connected Parallel Computers
IEEE Transactions on Computers
Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism
Proceedings of the 31st annual international symposium on Computer architecture
Multi-phase minimal fault-tolerant wormhole routing in meshes
Parallel Computing
A New Fault Information Model for Fault-Tolerant Adaptive and Minimal Routing in 3-D Meshes
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Performance-Related Reliability Measures for Computing Systems
IEEE Transactions on Computers
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Hi-index | 0.00 |
Fault tolerance mechanisms become indispensable as the number of processors increases in large systems. Measuring the effectiveness of such mechanisms before its implementation becomes mandatory. Research toward understanding the effects of different network parameters on the dependability parameters, like mean time to network failure or availability, becomes necessary. In this paper we analyse in detail such effects with a methodology proposed previously by us. This methodology is based on Markov chains and Analysis of Variance techniques. As a case study we analyse the effects of network size, mean time to node failure, mean time to node repair, mean time to network repair and coverage of the failure when using a 2D mesh network with a fault-tolerant mechanism (similar to the one used in the BlueGene/L system), that is able to remove rows and/or columns in the presence of failures.