Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
IEEE Transactions on Computers
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
The parallel execution of DO loops
Communications of the ACM
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
Controlling application grain size on a network of workstations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests
ICS '95 Proceedings of the 9th international conference on Supercomputing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Compile Time Barrier Synchronization Minimization
IEEE Transactions on Parallel and Distributed Systems
Optimal task scheduling at run time to exploit intra-tile parallelism
Parallel Computing
Efficient hybrid parallelisation of tiled algorithms on SMP clusters
International Journal of Computational Science and Engineering
Selecting the tile shape to reduce the total communication volume
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Resource-sensitive synchronization inference by abduction
POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proof-Directed Parallelization Synthesis by Separation Logic
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
If the iterations of a loop nest cannot be partitioned into independent tasks, data communication for data dependence is inevitable in order to execute them on parallel machines. This kind of loop nest is referred to as a DOACROSS loop nest.This paper is concerned with compiler algorithms for parallelizing DOACROSS loop nests for distributed-memory multicomputers. We present a method that combines loop tiling, chain-based scheduling and indirect message passing to generate efficient message-passing parallel code. We present our experiment results on the Fujitsu AP1000 to show that low communication overhead and high speedup for DOACROSS loop nests on multicomputers can be achieved by tuning these techniques.