Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Loop quantization: a generalized loop unwinding technique
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
IEEE Transactions on Computers
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies
IEEE Transactions on Computers
Independent Partitioning of Algorithms with Uniform Dependencies
IEEE Transactions on Computers
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Parallel Programming and Compilers
Parallel Programming and Compilers
Structure of Computers and Computations
Structure of Computers and Computations
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
In this paper, an approach to the problem of exploiting parallelism within nested loops is proposed. The proposed method first finds out all the initially independent computations, and then, based on them, identifies the valid partitioning bases to partition the entire iteration space of the loop nest. Because the shape of the iteration space is taken into account, pseudo-dependence relations are eliminated and hence more parallelism is exploited. Our approach provides a systematic method to maximize the degree of fine- or coarse-grain parallelism and is free from the open question of how to combine different loop transformations for the goal of maximizing parallelism. It is also shown that our approach can exploit more parallelism than other related work and have many advantages over them.