Operating system level support for coherence in distributed systems

Authors:
Mike Livesey;Colin Allison
Affiliations:
University of St Andrews, North Haugh, St Andrews, Scotland;University of St Andrews, North Haugh, St Andrews, Scotland
Venue:
EW 5 Proceedings of the 5th workshop on ACM SIGOPS European workshop: Models and paradigms for distributed systems structuring
Year:
1992

Citing 10
Cited 0

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Virtual time

ACM Transactions on Programming Languages and Systems (TOPLAS)
Distributed programming in Argus

Communications of the ACM
High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback

IEEE Transactions on Computers
Principles of distributed database systems

Principles of distributed database systems
Distributed, object-based programming systems

ACM Computing Surveys (CSUR)
Concurrency control in advanced database applications

ACM Computing Surveys (CSUR)
Experience with transactions in QuickSilver

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Design and Evaluation of the Rollback Chip: Special Purpose Hardware for Time Warp

IEEE Transactions on Computers
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed system builders are faced with the task of meeting a variety of requirements on the global behaviour of the target system, such as stability, fault-tolerance and failure recovery, concurrency control, commitment, and consistency of replicated data. The subset of these requirements relevant to a particular application we call its coherence constraint. The coherence constraint may be very difficult to enforce.Existing operating system services do not provide the system builder with an adequate platform for addressing coherence, although some systems address other aspects of coherence; for example, Isis [3] addresses the fault-tolerance issue. Even recent developments in micro-kernels such as Mach 3.0 [4] and Chorus [18], which have concentrated on supporting the shared-memory abstraction, still leave the systems builder to bridge a significant gap between OS services and basic coherence requirements. The variety of coherence requirements has given rise to a welter of mechanisms having a familial resemblance yet lacking real conceptual integration [16,17,20]. Consequently, the distributed application programmer treats each requirement in isolation, often resulting in costly solutions which are nevertheless obscure and idiosyncratic.Such problems have been observed in the context of object-based programming environments such as Argus [13], Clouds [7] and others [6]. They are confirmed by our own experience with a persistent object store transaction mechanism using NFS-oriented file locking [5,15].This paper describes an approach to distributed coherence enforcement based upon rollback. The approach is optimistic in the sense that violations of coherence are resolved rather than prevented---rollback is the agent of this resolution.Support for coherence is provided by units of distributed computation called transactions. This transaction mechanism is highly controllable, being designed to support advanced database requirements, involving "non-atomic" transactions, as well as conventional atomic transactions (c.f [19]). The transaction service is underpinned by rollback to provide the synchronisation, supported in turn by stable checkpointing and an integrated IPC protocol.The approach raises two key issues. The first is the problem of disseminating rollback properly through a distributed system. The second arises because computational progress does not occur monotonically in physical time but along its own virtual time axis, and concerns the interaction of these two time axes.