Failure detection algorithms for a reliable execution of parallel programs

  • Authors:
  • S. Chabridon;E. Gelenbe

  • Affiliations:
  • -;-

  • Venue:
  • SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report on the design and simulation of novel algorithms which will ensure that application software runs correctly on a MIMD system in which processing units (PU) can fail. The effect of these algorithms is evaluated for random task graphs using simulation as failure rates increase. An example of a specific application is also examined (the Fast Fourier Transform) for which we construct the task graph and then simulate its execution under various values of the failure rates of processors.