Compile-Time Techniques for Data Distribution in Distributed Memory Machines

Authors:
J. Ramanujam;P. Sadayappan
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1991

Citing 16
Cited 51

Programming for Parallelism

Computer
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory storage patterns in parallel processing

Memory storage patterns in parallel processing
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Semi-automatic process partitioning for parallel computation

International Journal of Parallel Programming
Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Data optimization: allocation of arrays to reduce communication on SIMD machines

Journal of Parallel and Distributed Computing - Massively parallel computation
Supporting shared data structures on distributed memory architectures

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Compile-time techniques for parallel execution of loops on distributed memory multiprocessors

Compile-time techniques for parallel execution of loops on distributed memory multiprocessors
Compiling programs for nonshared memory machines

Compiling programs for nonshared memory machines
Compiler techniques for data partitioning of sequentially iterated parallel loops

ICS '90 Proceedings of the 4th international conference on Supercomputing
Array distribution in SUPERB

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Compiling for locality of reference

Compiling for locality of reference

Access normalization: loop restructuring for NUMA compilers

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Access normalization: loop restructuring for NUMA computers

ACM Transactions on Computer Systems (TOCS)
PARADIGM: a compiler for automatic data distribution on multicomputers

ICS '93 Proceedings of the 7th international conference on Supercomputing
Toward automatic partitioning of arrays on distributed memory computers

ICS '93 Proceedings of the 7th international conference on Supercomputing
Compiling for shared-memory and message-passing computers

ACM Letters on Programming Languages and Systems (LOPLAS)
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A novel approach towards automatic data distribution

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Reducing communication by honoring multiple alignments

ICS '95 Proceedings of the 9th international conference on Supercomputing
Mappings for communication minimization using distribution and alignment

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Data-localization for Fortran macro-dataflow computation using partial static task assignment

ICS '96 Proceedings of the 10th international conference on Supercomputing
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
A hyperplane based approach for optimizing spatial locality in loop nests

ICS '98 Proceedings of the 12th international conference on Supercomputing
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers

The Journal of Supercomputing
Deriving Array Distributions by Optimization Techniques

The Journal of Supercomputing
A compiler technique for improving whole-program locality

POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High performance Fortran 2.0

Compiler optimizations for scalable parallel systems
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors

Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops

Compiler optimizations for scalable parallel systems
A compilation method for communication—efficient partitioning of DOALL loops

Compiler optimizations for scalable parallel systems
Compiler optimization of dynamic data distributions for distributed-memory multicomputers

Compiler optimizations for scalable parallel systems
Automatic data and computation decomposition on distributed memory parallel computers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

The Journal of Supercomputing
Communication Optimization for Affine Recurrence Equations Using Broadcast and Locality

International Journal of Parallel Programming
Supporting Irregular Distributions Using Data-Parallel Languages

IEEE Parallel & Distributed Technology: Systems & Technology
A Layout-Conscious Iteration Space Transformation Technique

IEEE Transactions on Computers
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory

IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Scheduling the Computations of a Loop Nest with Respect to a Given Mapping

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Improving Locality in Out-of-Core Computations Using Data Layout Transformations

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework

IEEE Transactions on Parallel and Distributed Systems
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing

The Journal of Supercomputing
Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References

IEEE Transactions on Parallel and Distributed Systems
Linear data distribution based on index analysis

High performance scientific and engineering computing
Improving whole-program locality using intra-procedural and inter-procedural transformations

Journal of Parallel and Distributed Computing
Automatic array partitioning based on the Smith normal form

International Journal of Parallel Programming
Storage assignment during high-level synthesis for configurable architectures

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
An Efficient Communication Scheduling Method for the Processor Mapping Technique Applied Data Redistribution

The Journal of Supercomputing
Memetic algorithms for parallel code optimization

International Journal of Parallel Programming
A flexible processor mapping technique toward data localization for block-cyclic data redistribution

The Journal of Supercomputing
Applying Data Mapping Techniques to Vector DSPs

Journal of Signal Processing Systems
Parallel image processing with the block data parallel architecture

IBM Journal of Research and Development
Automatic memory partitioning and scheduling for throughput and power optimization

Proceedings of the 2009 International Conference on Computer-Aided Design
Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)
Compiler-assisted data distribution for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

The Journal of Supercomputing
Memory partitioning and scheduling co-optimization in behavioral synthesis

Proceedings of the International Conference on Computer-Aided Design
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

A solution to the problem of partitioning data for distributed memory machines is discussed. The solution uses a matrix notation to describe array accesses in fully parallel loops, which allows the derivation of sufficient conditions for communication-free partitioning (decomposition) of arrays. A series of examples that illustrate the effectiveness of the technique for linear references, the use of loop transformations in deriving the necessary data decompositions, and a formulation that aids in deriving heuristics for minimizing a communication when communication-free partitions are not feasible are presented.