Paradigms for process interaction in distributed programs
ACM Computing Surveys (CSUR)
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
The Stanford Dash Multiprocessor
Computer
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Automatic data mapping for distributed-memory parallel computers
ICS '92 Proceedings of the 6th international conference on Supercomputing
Model-driven mapping onto distributed memory parallel computers
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Replication techniques for speeding up parallel applications on distributed systems
Concurrency: Practice and Experience
PARADIGM: a compiler for automatic data distribution on multicomputers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Partitioning the global space for distributed memory systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A flexible operation execution model for shared distributed objects
Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compiler techniques for data partitioning of sequentially iterated parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Compiler and run-time support for semi-structured applications
ICS '97 Proceedings of the 11th international conference on Supercomputing
Concepts and Notations for Concurrent Programming
ACM Computing Surveys (CSUR)
Mesh Partitioning for Efficient Use of Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
The design and evaluation of a shared object system for distributed memory machines
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Hi-index | 0.00 |
This paper presents a general layout for partitioning and mapping data across processors in data parallel applications. Our scheme generalizes the existing schemes (block, cyclic) and enables non-traditional ones (e.g. graph partitioning [7, 17]). A distributed algorithm uses the data layout and the read/write access patterns to ensure consistency for data parallel applications. We show examples of the applicability of our data layout and consistency schemes for different classes of scientific applications. We present experimental results on the effectiveness of our approach for loosely synchronous, data parallel applications.