The Robust-Algorithm Approach to Fault Tolerance on Processor Arrays: Fault Models, Fault Diameter, and Basic Algorithms

Authors:
Affiliations:
Venue:
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Year:
1998

Citing 8
Cited 3

Running algorithms efficiently on faulty hypercubes

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Multi-scale self-simulation: a technique for reconfiguring arrays with faults

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Sorting n2 Numbers on n x n Meshes

IEEE Transactions on Parallel and Distributed Systems
Fault Diameter of k-ary n-cube Networks

IEEE Transactions on Parallel and Distributed Systems
Optimal Sorting Algorithms on Incomplete Meshes with Arbitrary Fault Patterns

ICPP '97 Proceedings of the international Conference on Parallel Processing
Robust shearsort on incomplete bypass meshes

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Fault-tolerant sorting in SIMD hypercubes

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Asymptotically tight bounds for computing with faulty arrays of processors

SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science

Fault Tolerant Algorithms for a Linear Array with a Reconfigurable Pipelined Bus System

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
RACE: A Software-Based Fault Tolerance Scheme for Systematically Transforming Ordinary Algorithms to Robust Algorithms

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Constant time fault tolerant algorithms for a linear array with a reconfigurable pipelined bus system

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With few exceptions. the two issues of algorithm design and fault tolerance for processor arrays have been dealt with separately, in that algorithm developers have assumed the availability of complete fault-free arrays and fault tolerance techniques have aimed at restoring such complete arrays by reconfiguring faulty ones. We present the design of robust algorithms that run efficiently on complete arrays but are also tolerant of faulty processors/links in a degraded mode. This is a complementary approach in that our algorithms can be used on reconfigurable arrays that tolerate a certain number of faults while maintaining their regularity. with the graceful degradation feature kicking in once the fault tolerance limit of the reconfiguration scheme is exceeded. The fault models considered in this paper comprise of the faulty processors/links being removed from the pool of resources (removal model) or bypassed in their respective rows/columns (bypass model). We discuss the two models. derive tight upper bounds for the fault diameter of the resulting networks. and present building-block algorithms for semigroup computation. parallel prefix computation. data rearrangement. matrix multiplication. and sorting.