Gracefully Degradable Processor Arrays
IEEE Transactions on Computers
A Study of Two Approaches for Reconfiguring Fault-Tolerant Systolic Arrays
IEEE Transactions on Computers
Dependability evaluation of a class of multi-loop topologies for local area networks
IBM Journal of Research and Development
IEEE Transactions on Computers
Detailed Modeling and Reliability Analysis of Fault-Tolerant Processor Arrays
IEEE Transactions on Computers
Distributed Computing Network Reliability
Distributed Computing Network Reliability
Job Scheduling in a Partitionable Mesh Using a Two-Dimensional Buddy System Partitioning Scheme
IEEE Transactions on Parallel and Distributed Systems
A Unified Task-Based Dependability Model for Hypercube Computers
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
An Availability Model for MIN-Based Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Allocating Precise Submeshes in Mesh Connected Systems
IEEE Transactions on Parallel and Distributed Systems
An Efficient Method for Approximating Submesh Reliability of Two-Dimensional Meshes
IEEE Transactions on Parallel and Distributed Systems
Hardware-Software Co-Reliability in Field Reconfigurable Multi-Processor-Memory Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Hi-index | 14.98 |
Analytical techniques for reliability and availability prediction of mesh-connected systems are proposed in this paper. The models are based on the submesh requirements. First, a reliability model is proposed assuming that a submesh can be always recognized if it exits. Analysis of the linear consecutive n-out-of-N system is extended using an expanding row/column technique to evaluate the submesh reliability. An alternative approach called row folding is also discussed. Due to the high complexity involved in computing the exact reliability, both of these techniques use approximation to estimate lower bounds. Next, the submesh reliability is computed based on two different allocation policies, known as the two-dimensional buddy system (TDBS), and the frame sliding (FS). The model with the TDBS is further extended to estimate the reliability of multiple working submeshes, which is useful in a multiuser environment. Availability analysis for a submesh of the required size is conducted using a Markov chain (MC). State truncation is used to reduce the computation time, and the MC is solved using a software package called HARP. Validation of the analytical models is done through extensive simulation. Issues, such as reliability comparison based on allocation policies, and methods for improving system reliability are addressed using the analytical models.