A global approach to detection of parallelism
A global approach to detection of parallelism
Automatic decomposition of scientific programs for parallel execution
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
A technique for summarizing data access and its use in parallelism enhancing transformations
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Mirage: a coherent distributed shared memory design
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Munin: distributed shared memory based on type-specific memory coherence
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Compiler optimizations for Fortran D on MIMD distributed-memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Evaluation of release consistent software distributed shared memory on emerging network technology
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Automatic and interactive parallelization
Automatic and interactive parallelization
Preliminary experiences with the Fortran D compiler
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compilation techniques for block-cyclic distributions
ICS '94 Proceedings of the 8th international conference on Supercomputing
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiler Analysis for Irregular Problems in Fortran D
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Program Transformation for Locality Using Affinity Regions
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
KOAN: A Shared Virtual Memory for the iPSC/2 Hypercube
CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
A synthesis of memory mechanisms for distributed architectures
ICS '01 Proceedings of the 15th international conference on Supercomputing
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs
International Journal of Parallel Programming
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Compilation and Runtime-Optimizations for Software Distributed Shared Memory
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time Synchronization Optimizations for Software DSMs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Towards implementation of a novel scheme for data prefetching on distributed shared memory systems
The Journal of Supercomputing
Hi-index | 0.00 |
Distributed shared memory (DSM) systems provide an illusion of shared memory on distributed memory systems such as workstation networks and some parallel computers such as the Cray T3D and Convex SPP-1. This illusion is provided either by enhancements to hardware, software, or a combination thereof. On these systems, users can write programs using a shared memory style of programming instead of message passing which is tedious and error prone. Our experience with one such system, TreadMarks, has shown that a large class of applications does not perform well on these systems. TreadMarks is a software distributed shared memory system designed by Rice University researchers to run on networks of workstations and massively parallel computers. Due to the distributed nature of the memory system, shared memory synchronization primitives such as locks and barriers often cause significant amounts of communication.We have provided a set of powerful primitives that will alleviate the problems with locks and barriers on such systems. We have designed two sets of primitives, the first set maintains coherence and is easy to use by a programmer, the second set does not maintain coherence and is best used by a compiler. These primitives require that the underlying DSM be enhanced. We have implemented some of our primitives on the TreadMarks DSM system and obtained reasonable performance improvements on application kernels from molecular dynamics and numerical analysis. Furthermore, we have identified key compiler optimizations that use the non-coherent primitives to reduce synchronization overhead and improve the performance of DSM systems.