Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
The parallel execution of DO loops
Communications of the ACM
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Locality Analysis for Distributed Shared-Memory Multiprocessors
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Quantifying the Multi-level Nature of Tiling Interactions
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Automatic parallelization for symmetric shared-memory multiprocessors
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
The increasing depth of memory and parallelism hierarchies in future scalable computer systems poses many challenges to parallelizing compilers. In this paper, we address the problem of selecting and implementing iteration-reordering loop transformations for hierarchical parallelism and locality. We present a two-pass algorithm for selecting sequences of Block, Unimodular, Parallel, and Coalesce transformations for optimizing locality and parallelism for a specified parallelism hierarchy model. These general transformation sequences are implemented using a framework for iteration-reordering loop transformations that we developed in past work [15].