Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Semi-automatic process partitioning for parallel computation
International Journal of Parallel Programming
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiler techniques for data partitioning of sequentially iterated parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
The parallel execution of DO loops
Communications of the ACM
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Pipelined Data Parallel Algorithms-II: Design
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Data-localization for Fortran macro-dataflow computation using partial static task assignment
ICS '96 Proceedings of the 10th international conference on Supercomputing
Optimal Data Scheduling for Uniform Multidimensional Applications
IEEE Transactions on Computers
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers
The Journal of Supercomputing
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Communication-free partitioning of nested loops
Compiler optimizations for scalable parallel systems
A compilation method for communication—efficient partitioning of DOALL loops
Compiler optimizations for scalable parallel systems
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Journal of Supercomputing
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Practical parallel computing
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing
The Journal of Supercomputing
Linear data distribution based on index analysis
High performance scientific and engineering computing
Memetic algorithms for parallel code optimization
International Journal of Parallel Programming
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
In distributed memory multicomputers, local memory accesses are much faster than thoseinvolving interprocessor communication. For the sake of reducing or even eliminating theinterprocessor communication, the array elements in programs must be carefullydistributed to local memory of processors for parallel execution. We devote our efforts tothe techniques of allocating array elements of nested loops onto multicomputers in acommunication-free fashion for parallelizing compilers. We first analyze the pattern ofreferences among all arrays referenced by a nested loop, and then partition the iterationspace into blocks without interblock communication. The arrays can be partitioned underthe communication-free criteria with nonduplicate or duplicate data. Finally, a heuristicmethod for mapping the partitioned array elements and iterations onto the fixed-sizemulticomputers under the consideration of load balancing is proposed. Based on thesemethods, the nested loops can execute without any communication overhead on thedistributed memory multicomputers. Moreover, the performance of the strategies withnonduplicate and duplicate data for matrix multiplication is studied.