Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Memory Access Dependencies in Shared-Memory Multiprocessors
IEEE Transactions on Software Engineering
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Techniques for reducing consistency-related communication in distributed shared-memory systems
ACM Transactions on Computer Systems (TOCS)
An evaluation of software-based release consistent protocols
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A Unified Formalization of Four Shared-Memory Models
IEEE Transactions on Parallel and Distributed Systems
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
TreadMarks: distributed shared memory on standard workstations and operating systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Software write detection for a distributed shared memory
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Understanding application performance on shared virtual memory systems
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Scope consistency: a bridge between release consistency and entry consistency
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
An interaction of coherence protocols and memory consistency models in DSM systems
ACM SIGOPS Operating Systems Review
MultiView and Millipage — fine-grain sharing in page-based DSMs
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Guest Editors' Introduction-Cache Memory and Related Problems: Enhancing and Exploiting the Locality
IEEE Transactions on Computers - Special issue on cache memory and related problems
A high-level abstraction of shared accesses
ACM Transactions on Computer Systems (TOCS)
Comparative study of page-based and segment-based software DSM through compiler optimization
Proceedings of the 14th international conference on Supercomputing
Compilation and Runtime-Optimizations for Software Distributed Shared Memory
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Multiple-writer entry consistency
Cluster computing
Scalability issues in urban traffic systems
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
The region trap library: handling traps on application-defined regions of memory
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Lazy home-based protocol: combining homeless and home-based distributed shared memory protocols
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Hi-index | 0.00 |
This paper compares several implementations of entry consistency (EC) and lazy release consistency (LRC), two relaxed memory models in use with software distributed shared memory (DSM) systems. We use six applications in our study: SOR, Quicksort, Water, Barnes-Hut, IS, and 3D-FFT. For these applications, EC's requirement that all shared data be associated with a synchronization object leads to a fair amount of additional programming effort. We identify, in particular, extra synchronization, lock rebinding, and object granularity as sources of extra complexity. In terms of performance, for the set of applications and for the computing environment utilized neither model is consistently better than the other. For SOR and IS, execution times are about the same, but LRC is faster for Water (33%) and Barnes-Hut (41%) and EC is faster for Quicksort (14%) and 3D-FFT (10%). Among the implementations of EC and LRC, we independently vary the method for write trapping and the method for write collection. Our goal is to separate implementation issues from any particular model. We consider write trapping by compiler instrumentation of the code and by twinning (comparing the current version of shared data with an older version). Write collection is done either by scanning timestamps or by building diffs, records of the changes to shared data. For write trapping in EC, twinning is faster if data is shared at the granularity of a single word. For larger granularities than a word, compiler instrumentation is faster. For write trapping in LRC, twinning gives the best performance for all applications. For write collection in EC, timestamping works best in applications dominated by migratory data, while for other data diffing works best. For LRC, increased communication overhead in transmitting timestamps becomes an additional factor working in favor of diffing for applications with fine-grain sharing.