Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Online data-race detection via coherency guarantees
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Tradeoffs between false sharing and aggregation in software distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Eraser: a dynamic data race detector for multi-threaded programs
Proceedings of the sixteenth ACM symposium on Operating systems principles
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The relative importance of concurrent writers and weak consistency models
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
TreadMarks: distributed shared memory on standard workstations and operating systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
Measuring Consistency Costs for Distributed Shared Data
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Java Fast Sockets: Enabling high-speed Java communications on high performance clusters
Computer Communications
Hi-index | 0.24 |
Software Distributed Shared Memory (DSM) systems have been a research topic for over a decade. While good performance has been achieved in some cases, consistent performance has continued to elude researchers. This paper investigates the performance of DSM protocols running highly regular scientific applications. Such applications should be ideal targets for DSM research because past behavior gives complete, or nearly complete, information about future behavior. We show that a modified home-based protocol can significantly outperform more general protocols in this application domain because of reduced protocol complexity. Nonetheless, such protocols still do not perform as well as expected. We show that the one of the major factors limiting performance is interaction with the operating system on page faults and page protection changes. We further optimize our protocol by completely eliminating such memory manipulation calls from the steady-state execution. Our resulting protocol improves average application performance by a further 34%, on top of the 19% improvement gained by our initial modification of the home-based protocol.