Covering a square by small perimeter rectangles
Discrete & Computational Geometry
The decomposition of a square into rectangles of minimal perimeter
Discrete Applied Mathematics
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
Partitioning and scheduling parallel programs for execution on multiprocessors
Partitioning and scheduling parallel programs for execution on multiprocessors
The decomposition of a rectangle into rectangles of minimal perimeter
SIAM Journal on Computing
Mul-T: a high-performance parallel Lisp
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems
IEEE Transactions on Computers
Generalised multiprocessor scheduling using optimal control
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
A fast static scheduling algorithm for DAGs on an unbounded number of processors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
Compile-Time Scheduling of Dynamic Constructs in Dataflow Program Graphs
IEEE Transactions on Computers
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors
Compiler optimizations for scalable parallel systems
Generalized multiprocessor scheduling for directed acylic graphs
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Generalized Multiprocessor Scheduling and Applications to Matrix Computations
IEEE Transactions on Parallel and Distributed Systems
CPR: Mixed Task and Data Parallel Scheduling for Distributed Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Hi-index | 0.01 |
This paper presents a hierarchical approach for compiling macro dataflow graphs formultiprocessors with local memory. Macro dataflow graphs comprise several nodes (ormacro operations) that must be executed subject to prespecified precedence constraints.Programs consisting of multiple nested loops, where the precedence constraints betweenthe loops are known, can be viewed as macro dataflow graphs. The hierarchicalcompilation approach comprises a processor allocation phase followed by a partitioningphase. In the processor allocation phase, using estimated speedup functions for themacro nodes, computationally efficient techniques establish the sequencing andparallelism of macro operations for close-to-optimal run-times. The second phasepartitions the computations in each macro node to maximize communication locality forthe level of parallelism determined by the processor allocation phase. The same approachcan also be used for programs consisting of multiple loop nests, when each of the nestedloops can be characterized by a speedup function. These ideas have been implemented ina prototype structure-driven compiler, SDC, for expressions of matrix operations. Thepaper presents the performance of the compiler for several matrix expressions on asimulator of the Alewife multiprocessor.