Failure-tolerant parallel programming and its supporting system architecture

  • Authors:
  • K. H. Kim;C. V. Ramamoorthy

  • Affiliations:
  • University of Southern California, Los Angeles, California;University of California, Berkeley, California

  • Venue:
  • AFIPS '76 Proceedings of the June 7-10, 1976, national computer conference and exposition
  • Year:
  • 1976

Quantified Score

Hi-index 0.00

Visualization

Abstract

The state-of-art in software validation as well as the continuing growth of the size and complexity of software subsystems, makes extra costs paid for software error tolerance more than justified. A program in which software redundancy is incorporated i.e. a program in which procedures for run-time validation and recovery are explicitly specified, is generally called a failure-tolerant program. One problem in failure-tolerant programming, which could be particularly serious in real-time computing environments, is the program execution time increased due to incorporation of validation and recovery procedures. This paper introduces an approach to the solution, called the failure-tolerant parallel programming. The essence of this approach is to maximally overlap main-stream computation with redundant computation oriented for validation and recovery. Subsequently, a model system architecture tailored for efficient execution of failure-tolerant parallel programs is described. It is of highly general and modular nature and contains a novel memory subsystem named the duplex memory. Directions of further researches on program structuring and expansion of the model architecture are also indicated.