Evaluation of release consistent software distributed shared memory on emerging network technology
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
An evaluation of software-based release consistent protocols
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler and software distributed shared memory support for irregular applications
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating barrier synchronization for compiler-parallelized codes on software DSMs
International Journal of Parallel Programming - Special issue on languages and compilers for parallel computing. Part I
High performance Fortran compilation techniques for parallelizing scientific codes
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Improving the performance of DSM systems via compiler involvement
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Comparison of Entry Consistency and Lazy Release Consistency Implementations
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Compile-time Synchronization Optimizations for Software DSMs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Message-driven relaxed consistency in a software distributed shared memory
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Efficient Support for Two-Dimensional Data Distributions in Distributed Shared Memory Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Hi-index | 0.00 |
We present two novel optimizations for compiling High Performance Fortran (HPF) to page-based software distributed shared memory systems (SDSM). One technique, compiler-managed restricted consistency, uses compiler-derived knowledge to delay the application of memory consistency operations to data that is provably not shared in the current synchronization interval, thus reducing false sharing. The other technique, compiler-managed shared buffers, when combined with the previous optimization, eliminates fragmentation. Together, the two techniques permit compiler-generated code to efficiently apply multi-dimensional computation partitioning and wavefront parallelism to execute efficiently on SDSM systems.