Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Digital image processing (2nd ed.)
Digital image processing (2nd ed.)
A design methodology for synthesizing parallel algorithms and architectures
Journal of Parallel and Distributed Computing
Nearest-neighbor mapping of finite element graphs onto processor meshes
IEEE Transactions on Computers
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms
IEEE Transactions on Computers
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
IEEE Transactions on Computers
The parallel execution of DO loops
Communications of the ACM
The Generation of a Class of Multipliers: Synthesizing Highly Parallel Algorithms in VLSI
IEEE Transactions on Computers
Pipelined Data Parallel Algorithms-II: Design
IEEE Transactions on Parallel and Distributed Systems
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Compiler technology for parallel scientific computation
Scientific Programming
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP
Journal of VLSI Signal Processing Systems
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Communication-free partitioning of nested loops
Compiler optimizations for scalable parallel systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Evaluation of Loop Grouping Methods Based on Orthogonal Projection Spaces
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Automatic parallel code generation for tiled nested loops
Proceedings of the 2004 ACM symposium on Applied computing
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
Hi-index | 0.00 |
A method for executing nested loops with constant loop-carried dependencies in parallelon message-passing multiprocessor systems to reduce communication overhead is presented. In the partitioning phase, the nested loop is divided into blocks that reduce the interblock communication, without regard to the machine topology. The execution ordering of the iterations is defined by a given time function based on L. Lamport's (1974) hyperplane method. The iterations are then partitioned into blocks so that the execution ordering is not disturbed, and the amount of interblock communication is minimized. In the mapping phase, the partitioned blocks are mapped onto a fixed-size multiprocessor system in such a manner that the blocks that have to exchange data frequently are allocated to the same processor or neighboring processors. A heuristic mapping algorithm for hypercube machines is proposed.