Munin: distributed shared memory based on type-specific memory coherence
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Generating communication for array statements: design, implementation, and evaluation
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compiling array expressions for efficient execution on distributed-memory machines
Journal of Parallel and Distributed Computing
Hive: implementing a virtual distributed shared memory in Java
Distributed and parallel systems
Parallelization of Irregular Problems Based on Hierarchical Domain Representation
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
State of the Art in Compiling HPF
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
Hi-index | 0.00 |
This paper presents a methodology to design a distributed shared memory by decomposing it into two layers. An application independent layer supplies the basic functionalities to access shared structures and optimizes these functionalities according to the underlying architecture. On top of this layer, that can be seen as an application independent run time support, an application dependent layer defines the most suitable consistency model for the considered class of applications and it implements the most appropriate caching and prefetching strategies for the consistency model. To exemplify this methodology, we introduce DVSA, a package that implements the application independent layer and SHOB, an example of the second layer. SHOB defines a release consistency model for iterative numerical algorithms and it implements the corresponding caching and prefetching strategies. We present some experimental results of the methodology and discuss the performance of a uniform multigrid method developed through SHOB on a massively parallel architecture, the Meiko CS2, and on a cluster of workstations.