Fault-Tolerant Parallel Applications Using Queues and Actions

  • Authors:
  • J. A. Smith;Santosh K. Shrivastava

  • Affiliations:
  • -;-

  • Venue:
  • ICPP '97 Proceedings of the international Conference on Parallel Processing
  • Year:
  • 1997

Quantified Score

Hi-index 0.03

Visualization

Abstract

There are many techniques supporting execution of large computations over a network of workstations (NOW) but data intensive computations are usually run on high performance parallel machines. A NOW comprising individual user's machines typically has a low performance interconnect and suffers arbitrary changes of availability. Exploiting such resources to execute data intensive computations is difficult, but even in a more constrained environment there is an unfulfilled need for fault-tolerance. The structuring approach presented fulfills this need. Performance exceeding 100~Mflop/s is demonstrated for large fault-tolerant out of core examples of matrix multiplication and Cholesky factorisation using five 133~MHz Pentium compute machines.