Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory

Authors:
G. N. Srinivasa Prasanna;A. Agrawal;B. R. Musicus
Affiliations:
-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1994

Citing 16
Cited 8

Covering a square by small perimeter rectangles

Discrete & Computational Geometry
The decomposition of a square into rectangles of minimal perimeter

Discrete Applied Mathematics
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
Partitioning and scheduling parallel programs for execution on multiprocessors

Partitioning and scheduling parallel programs for execution on multiprocessors
The decomposition of a rectangle into rectangles of minimal perimeter

SIAM Journal on Computing
Mul-T: a high-performance parallel Lisp

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems

IEEE Transactions on Computers
Generalised multiprocessor scheduling using optimal control

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
A fast static scheduling algorithm for DAGs on an unbounded number of processors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
Efficient Processor Assignment Algorithms and Loop Transformations for Executing Nested Parallel Loops on Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR

THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR

Compile-Time Scheduling of Dynamic Constructs in Dataflow Program Graphs

IEEE Transactions on Computers
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors

Compiler optimizations for scalable parallel systems
Generalized multiprocessor scheduling for directed acylic graphs

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Generalized Multiprocessor Scheduling and Applications to Matrix Computations

IEEE Transactions on Parallel and Distributed Systems
CPR: Mixed Task and Data Parallel Scheduling for Distributed Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
An improved two-step algorithm for task and data parallel scheduling in distributed memory machines

Parallel Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a hierarchical approach for compiling macro dataflow graphs formultiprocessors with local memory. Macro dataflow graphs comprise several nodes (ormacro operations) that must be executed subject to prespecified precedence constraints.Programs consisting of multiple nested loops, where the precedence constraints betweenthe loops are known, can be viewed as macro dataflow graphs. The hierarchicalcompilation approach comprises a processor allocation phase followed by a partitioningphase. In the processor allocation phase, using estimated speedup functions for themacro nodes, computationally efficient techniques establish the sequencing andparallelism of macro operations for close-to-optimal run-times. The second phasepartitions the computations in each macro node to maximize communication locality forthe level of parallelism determined by the processor allocation phase. The same approachcan also be used for programs consisting of multiple loop nests, when each of the nestedloops can be characterized by a speedup function. These ideas have been implemented ina prototype structure-driven compiler, SDC, for expressions of matrix operations. Thepaper presents the performance of the compiler for several matrix expressions on asimulator of the Alewife multiprocessor.