The Robust-Algorithm Approach to Fault Tolerance on Processor Arrays: Fault Models, Fault Diameter, and Basic Algorithms

  • Authors:
  • Affiliations:
  • Venue:
  • IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

With few exceptions. the two issues of algorithm design and fault tolerance for processor arrays have been dealt with separately, in that algorithm developers have assumed the availability of complete fault-free arrays and fault tolerance techniques have aimed at restoring such complete arrays by reconfiguring faulty ones. We present the design of robust algorithms that run efficiently on complete arrays but are also tolerant of faulty processors/links in a degraded mode. This is a complementary approach in that our algorithms can be used on reconfigurable arrays that tolerate a certain number of faults while maintaining their regularity. with the graceful degradation feature kicking in once the fault tolerance limit of the reconfiguration scheme is exceeded. The fault models considered in this paper comprise of the faulty processors/links being removed from the pool of resources (removal model) or bypassed in their respective rows/columns (bypass model). We discuss the two models. derive tight upper bounds for the fault diameter of the resulting networks. and present building-block algorithms for semigroup computation. parallel prefix computation. data rearrangement. matrix multiplication. and sorting.