Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation

Authors:
K. H. Kim
Affiliations:
-
Venue:
IEEE Transactions on Software Engineering
Year:
1988

Citing 9
Cited 12

Validation of recoverable concurrent software systems based on the programmer-transparent coordination scheme

Validation of recoverable concurrent software systems based on the programmer-transparent coordination scheme
Fault-Tolerant Software for Real-Time Applications

ACM Computing Surveys (CSUR)
Monitors: an operating system structuring concept

Communications of the ACM
Reliable Computer Systems

Reliable Computer Systems
The architecture of concurrent programs

The architecture of concurrent programs
A program structure for error detection and recovery

Operating Systems, Proceedings of an International Symposium
Cooperating sequential processes

The origin of concurrent programming
Structure of an efficient duplex memory for processing fault-tolerant programs

ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
Recovery blocks in action: A system supporting high reliability

ICSE '76 Proceedings of the 2nd international conference on Software engineering

Recoverable Distributed Shared Virtual Memory

IEEE Transactions on Computers
A System Architecture for Fault Tolerance in Concurrent Software

Computer
SUVS: a distributed real-time system testbed for fault-tolerant computing

SAC '92 Proceedings of the 1992 ACM/SIGAPP symposium on Applied computing: technological challenges of the 1990's
Optimistic Crash Recovery without Changing Application Messages

IEEE Transactions on Parallel and Distributed Systems
A Gracefully Degrading Massively Parallel System Using the BSP Model, and Its Evaluation

IEEE Transactions on Computers
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification

IEEE Transactions on Parallel and Distributed Systems
Low-Cost Error Containment and Recovery for Onboard Guarded Software Upgrading and Beyond

IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Error Recovery in Shared Memory Multiprocessors Using Private Caches

IEEE Transactions on Parallel and Distributed Systems
Using Petri Nets for the Design of Conversation Boundaries in Fault-Tolerant Software

IEEE Transactions on Parallel and Distributed Systems
CSP Methods for Identifying Atomic Actions in the Design of Fault Tolerant Concurrent Systems

IEEE Transactions on Software Engineering
Gracefully Degrading Systems Using the Bulk-Synchronous Parallel Model with Randomised Shared Memory

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Efficient algorithms for optimistic crash recovery

Distributed Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

An approach to coordination of cooperating concurrent processes, each capable of error direction and recovery, is presented. Error detection, rollback, and retry in a process are specified by a well-structured language construct called recovery block. Recovery points of processes must be properly coordinated to prevent a disastrous avalanche of process rollbacks. The approach relies on an intelligent processor system (that runs processes) capable of establishing and discarding the recovery points of interacting processes in a well coordinated manner such that a process never makes two consecutive rollbacks without making a retry between the two, and every process rollback becomes a minimum-distance rollback. Following a discussion of the underlying philosophy of the author's approach, basic rules of reducing storage and time overhead in such a processor system are discussed. Examples are drawn from the systems in which processes communicate through monitors