A Comparison of Entry Consistency and Lazy Release Consistency Implementations

Authors:
Sarita V. Adve;Alan L. Cox;Sandhya Dwarkadas;Ramakrishnan Rajamony;Willy Zwaenepoel
Affiliations:
-;-;-;-;-
Venue:
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Year:
1996

Citing 10
Cited 12

Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
Memory Access Dependencies in Shared-Memory Multiprocessors

IEEE Transactions on Software Engineering
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Techniques for reducing consistency-related communication in distributed shared-memory systems

ACM Transactions on Computer Systems (TOCS)
An evaluation of software-based release consistent protocols

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A Unified Formalization of Four Shared-Memory Models

IEEE Transactions on Parallel and Distributed Systems
SPLASH: Stanford parallel applications for shared-memory

SPLASH: Stanford parallel applications for shared-memory
TreadMarks: distributed shared memory on standard workstations and operating systems

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Software write detection for a distributed shared memory

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

Understanding application performance on shared virtual memory systems

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Scope consistency: a bridge between release consistency and entry consistency

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
An interaction of coherence protocols and memory consistency models in DSM systems

ACM SIGOPS Operating Systems Review
MultiView and Millipage — fine-grain sharing in page-based DSMs

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Guest Editors' Introduction-Cache Memory and Related Problems: Enhancing and Exploiting the Locality

IEEE Transactions on Computers - Special issue on cache memory and related problems
A high-level abstraction of shared accesses

ACM Transactions on Computer Systems (TOCS)
Comparative study of page-based and segment-based software DSM through compiler optimization

Proceedings of the 14th international conference on Supercomputing
Compilation and Runtime-Optimizations for Software Distributed Shared Memory

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Multiple-writer entry consistency

Cluster computing
Scalability issues in urban traffic systems

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
The region trap library: handling traps on application-defined regions of memory

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Lazy home-based protocol: combining homeless and home-based distributed shared memory protocols

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper compares several implementations of entry consistency (EC) and lazy release consistency (LRC), two relaxed memory models in use with software distributed shared memory (DSM) systems. We use six applications in our study: SOR, Quicksort, Water, Barnes-Hut, IS, and 3D-FFT. For these applications, EC's requirement that all shared data be associated with a synchronization object leads to a fair amount of additional programming effort. We identify, in particular, extra synchronization, lock rebinding, and object granularity as sources of extra complexity. In terms of performance, for the set of applications and for the computing environment utilized neither model is consistently better than the other. For SOR and IS, execution times are about the same, but LRC is faster for Water (33%) and Barnes-Hut (41%) and EC is faster for Quicksort (14%) and 3D-FFT (10%). Among the implementations of EC and LRC, we independently vary the method for write trapping and the method for write collection. Our goal is to separate implementation issues from any particular model. We consider write trapping by compiler instrumentation of the code and by twinning (comparing the current version of shared data with an older version). Write collection is done either by scanning timestamps or by building diffs, records of the changes to shared data. For write trapping in EC, twinning is faster if data is shared at the granularity of a single word. For larger granularities than a word, compiler instrumentation is faster. For write trapping in LRC, twinning gives the best performance for all applications. For write collection in EC, timestamping works best in applications dominated by migratory data, while for other data diffing works best. For LRC, increased communication overhead in transmitting timestamps becomes an additional factor working in favor of diffing for applications with fine-grain sharing.