A System for Fault-Tolerant Execution of Data and Compute Intensive Programs Over a Network of Workstations

  • Authors:
  • J. Smith;S. K. Shrivastava

  • Affiliations:
  • -;-

  • Venue:
  • A System for Fault-Tolerant Execution of Data and Compute Intensive Programs Over a Network of Workstations
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

A well known structuring technique for a wide class of parallel applications is the bag of tasks, which allows a computation to be partitioned dynamically between a collection of concurrent processes. This paper describes a fault-tolerant implementation of this structure using atomic actions (atomic transactions) to operate on persistent objects, which are accessed in a distributed setting via a Remote Procedure Call (RPC). The system developed is suited to parallel execution of data and compute intensive programs that require persistent storage and fault tolerance facilities. The suitability of the system is examined in the context of the measured performance of three specific applications; ray tracing, matrix multiplication and Cholesky factorization. The system developed runs on stock hardware and software platforms, specifically UNIX, C++.