Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A Unified Formalization of Four Shared-Memory Models
IEEE Transactions on Parallel and Distributed Systems
Home-Based SVM Protocols for SMP Clusters: Design and Performance
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
TreadMarks: distributed shared memory on standard workstations and operating systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Quantifying and Resolving Remote Memory Access Contention on Hardware DSM Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Hi-index | 0.00 |
In this paper, we examine the causes and effects of contention for shared data access in parallel programs running on a software distributed shared memory (DSM) system. Specifically, we experiment on two widely-used, page-based protocols, Princeton's home-based lazy release consistency (HLRC) and TreadMarks. For most of our programs, these protocols were equally affected by latency increases caused by contention and achieved similar performance. Where they differ significantly, HLRC's ability to manually eliminate load imbalance was the largest factor accounting for the difference. To quantify the effects of contention we either modified the application to eliminate the cause of the contention or modified the underlying protocol to efficiently handle it. Overall, we find that contention has profound effects on performance: eliminating contention reduced execution time by 64% in the most extreme case, even at the relatively modest scale of 32 nodes that we consider in this paper.