Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
ACM SIGOPS Operating Systems Review
Message passing versus distributed shared memory on networks of workstations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Scope consistency: a bridge between release consistency and entry consistency
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
The effect of interrupts on software pipeline execution on message-passing architectures
ICS '96 Proceedings of the 10th international conference on Supercomputing
Using fine-grain threads and run-time decision making in parallel computing
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Compiler and software distributed shared memory support for irregular applications
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
MultiView and Millipage — fine-grain sharing in page-based DSMs
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Program Improvement by Source-to-Source Transformation
Journal of the ACM (JACM)
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Run-Time Selection of Block Size in Pipelined Parallel Programs
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Run-Time Parallelization of Irregular DOACROSS Loops
IRREGULAR '95 Proceedings of the Second International Workshop on Parallel Algorithms for Irregularly Structured Problems
Improving Release-Consistent Shared Virtual Memory using Automatic Update
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Dynamically Controlling False Sharing in Distributed Shared Memory
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Hi-index | 0.00 |
Though more difficult to program, distributed-memory parallel machines provide greater scalability than their shared-memory counterparts. Software Distributed Shared Memory (SDSM) systems provide the abstraction of shared memory on a distributed machine. While SDSMs provide an attractive programming model, they currently can not efficiently support all classes of scientific applications. One such class are those with recurrences that cause dependencies across processors or nodes. A popular solution to such problems is to use pipelining, which breaks the computation into blocks; each processor performs the computation of a block, which enables the next processor in the pipeline to compute its corresponding block. Once the pipeline is filled, the computation of blocks proceeds in parallel. While pipelining is useful, it is not efficiently supported by current SDSM systems.This paper presents an approach to integrating pipelining into SDSM systems. We describe our design and implementation of one-way pipelining in a SDSM. The key idea is to retain the shared-memory model, but design the extensions such that the execution will mimic what would be done in an explicit message-passing program. We show that one-way pipelining is superior to the two most common ways to program pipelined applications, which are distributed locks and explicit matrix transposition. Finally, we show that one-way pipelining is competitive with a hand-coded, explicit message-passing program.