Failure-tolerant parallel programming and its supporting system architecture

Authors:
K. H. Kim;C. V. Ramamoorthy
Affiliations:
University of Southern California, Los Angeles, California;University of California, Berkeley, California
Venue:
AFIPS '76 Proceedings of the June 7-10, 1976, national computer conference and exposition
Year:
1976

Citing 12
Cited 1

An Assessment of Techniques for Proving Program Correctness

ACM Computing Surveys (CSUR)
Reliability and Integrity of Large Computer Programs

GFK-GI-GMR Fachtagung Prozessrechner 1974
A program structure for error detection and recovery

Operating Systems, Proceedings of an International Symposium
A new approach to program testing

Proceedings of the international conference on Reliable software
SELECT—a formal system for testing and debugging programs by symbolic execution

Proceedings of the international conference on Reliable software
Design of self-checking software

Proceedings of the international conference on Reliable software
Programmed restarts

ACM '71 Proceedings of the 1971 26th annual conference
System software for a fault-tolerant digital computer

System software for a fault-tolerant digital computer
A Survey of Fault Tolerant Computer Architecture and its Evaluation

Computer
A Survey of Analytic Models of Rollback and Recovery Stratergies

Computer
The STAR (Self-Testing And Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant Computer Design

IEEE Transactions on Computers
A multiprocessor system design

AFIPS '63 (Fall) Proceedings of the November 12-14, 1963, fall joint computer conference

Design diversity: an approach to fault tolerance of design faults

AFIPS '84 Proceedings of the July 9-12, 1984, national computer conference and exposition

Quantified Score

Hi-index	0.00

Visualization

Abstract

The state-of-art in software validation as well as the continuing growth of the size and complexity of software subsystems, makes extra costs paid for software error tolerance more than justified. A program in which software redundancy is incorporated i.e. a program in which procedures for run-time validation and recovery are explicitly specified, is generally called a failure-tolerant program. One problem in failure-tolerant programming, which could be particularly serious in real-time computing environments, is the program execution time increased due to incorporation of validation and recovery procedures. This paper introduces an approach to the solution, called the failure-tolerant parallel programming. The essence of this approach is to maximally overlap main-stream computation with redundant computation oriented for validation and recovery. Subsequently, a model system architecture tailored for efficient execution of failure-tolerant parallel programs is described. It is of highly general and modular nature and contains a novel memory subsystem named the duplex memory. Directions of further researches on program structuring and expansion of the model architecture are also indicated.