Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Computer graphics: principles and practice (2nd ed.)
Computer graphics: principles and practice (2nd ed.)
Semantical interprocedural parallelization: an overview of the PIPS project
ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient and exact data dependence analysis
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A practical algorithm for exact array dependence analysis
Communications of the ACM
Counting solutions to Presburger formulas: how and why
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Symbolic analysis for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
The Power Test for Data Dependence
IEEE Transactions on Parallel and Distributed Systems
Symbolic Program Analysis and Optimization for Parallelizing Compilers
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Multithreading Runtime Support for Loop and Functional Parallelism
ISHPC '99 Proceedings of the Second International Symposium on High Performance Computing
A novel approach for partitioning iteration spaces with variable densities
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Cache-aware iteration space partitioning
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Cache-aware partitioning of multi-dimensional iteration spaces
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Parallel loops account for the greatest amount of parallelism in scientific and numerical codes. For example, most of the DO loops in SPEC CFP2000 and SPEC OMPM2001 are of DOALL type and account for a large percentage of the total execution time. One of the ways to exploit parallelism is to partition the iteration space of a DOALL loop amongst different processors in a parallel processor system. Naturally, a good partitioning is of key importance to achieve high performance and for efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of loops with both rectangular and non-rectangular iteration spaces, the problem of partitioning loops with conditionals has not been addressed so far to the best of our knowledge. In this paper, we present a mathematical model for partitioning parallel nested loops, both perfect and non-perfect, with conditionals, where the expressions in a conditional are affine functions of the outer loop indices. We present a loop transformation based on elimination of redundant constraints bounding the iteration space of a nested loop. The transformation plays a critical role during the (static) partitioning process as it helps to capture the "exact" lower and upper bounds (can be either a constant or symbolic) of the loop indices. We generate a canonical form of the loop nest using the transformation and employ the geometric approach we proposed earlier (in [1, 2]) for partitioning the iteration space along an axis corresponding to the outermost loop. For cases in which such a transformation does not exist, we propose a general approach for loop canonicalization. We present several examples from the literature and numerical packages to illustrate the effectiveness of our approach.