Research: Design of loosely coupled processes capable of time-bounded cooperative recovery: the PTC/SL scheme

  • Authors:
  • K. H. (Kane) Kim

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of California, Irvine, CA 92717, USA

  • Venue:
  • Computer Communications
  • Year:
  • 1993

Quantified Score

Hi-index 0.24

Visualization

Abstract

Design of loosely coupled distributed computer systems (DCS) required to tolerate propagated errors caused by software and/or hardware is a technological challenge that has been inadequately dealt with. In this paper, we adopt the view that a truly loosely coupled DCS consists of loosely coupled interacting processes distributed among multiple physical sites where each process is designed in the 'partitioned design' mode, i.e. designed with its interface specification only, rather than with full knowledge of interfaces between other processes (or sites). It then follows naturally that fault tolerance capabilities must be designed into loosely coupled processes in such systems without violating the partitioned design policy. The programmer-transparent coordination (PTC) scheme is one such approach that has been evolving since 1978. While the basic PTC scheme, called PTC/OR (PTC with obedient receiver) scheme, is a scheme for facilitating various forms of cooperative backward recovery in systems of loosely coupled processes, it has one drawback: the difficulty of bounding worst-case recovery time. After discussing various fundamentally different solution approaches and their limitations, a promising approach. called the PTC/SL (PTC with session leaders) scheme, which superimposes additional rules on structuring process interactions onto those of the PTC/OR scheme, is presented. Under the PTC/SL scheme, various flexible forms of process interactions are still allowed while the task of ensuring bounded recovery time is made a simple one.