Operating system level support for coherence in distributed systems

  • Authors:
  • Mike Livesey;Colin Allison

  • Affiliations:
  • University of St Andrews, North Haugh, St Andrews, Scotland;University of St Andrews, North Haugh, St Andrews, Scotland

  • Venue:
  • EW 5 Proceedings of the 5th workshop on ACM SIGOPS European workshop: Models and paradigms for distributed systems structuring
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed system builders are faced with the task of meeting a variety of requirements on the global behaviour of the target system, such as stability, fault-tolerance and failure recovery, concurrency control, commitment, and consistency of replicated data. The subset of these requirements relevant to a particular application we call its coherence constraint. The coherence constraint may be very difficult to enforce.Existing operating system services do not provide the system builder with an adequate platform for addressing coherence, although some systems address other aspects of coherence; for example, Isis [3] addresses the fault-tolerance issue. Even recent developments in micro-kernels such as Mach 3.0 [4] and Chorus [18], which have concentrated on supporting the shared-memory abstraction, still leave the systems builder to bridge a significant gap between OS services and basic coherence requirements. The variety of coherence requirements has given rise to a welter of mechanisms having a familial resemblance yet lacking real conceptual integration [16,17,20]. Consequently, the distributed application programmer treats each requirement in isolation, often resulting in costly solutions which are nevertheless obscure and idiosyncratic.Such problems have been observed in the context of object-based programming environments such as Argus [13], Clouds [7] and others [6]. They are confirmed by our own experience with a persistent object store transaction mechanism using NFS-oriented file locking [5,15].This paper describes an approach to distributed coherence enforcement based upon rollback. The approach is optimistic in the sense that violations of coherence are resolved rather than prevented---rollback is the agent of this resolution.Support for coherence is provided by units of distributed computation called transactions. This transaction mechanism is highly controllable, being designed to support advanced database requirements, involving "non-atomic" transactions, as well as conventional atomic transactions (c.f [19]). The transaction service is underpinned by rollback to provide the synchronisation, supported in turn by stable checkpointing and an integrated IPC protocol.The approach raises two key issues. The first is the problem of disseminating rollback properly through a distributed system. The second arises because computational progress does not occur monotonically in physical time but along its own virtual time axis, and concerns the interaction of these two time axes.