Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations

Authors:
Anna Beletska;Wlodzimierz Bielecki;Albert Cohen;Marek Palkowski;Krzysztof Siedlecki
Affiliations:
INRIA Saclay, 2-4, rue J. Monod, 91893 Orsay Cedex, France;West-Pomeranian Technical University, Computer Science Department, str. Zolnierska 52, 71210 Szcecin, Poland;INRIA Saclay, 2-4, rue J. Monod, 91893 Orsay Cedex, France;West-Pomeranian Technical University, Computer Science Department, str. Zolnierska 52, 71210 Szcecin, Poland;West-Pomeranian Technical University, Computer Science Department, str. Zolnierska 52, 71210 Szcecin, Poland
Venue:
Parallel Computing
Year:
2011

Citing 19
Cited 5

Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Transitive closure of infinite graphs and its applications

International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Iteration space slicing and its application to communication optimization

ICS '97 Proceedings of the 11th international conference on Supercomputing
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Scheduling and Automatic Parallelization

Scheduling and Automatic Parallelization
An Empirical Study of Fortran Programs for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
Mapping affine loop nests: new results

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Hyperplane Partitioning: An Approach to Global Data Partitioning for Distributed Memory Machines

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
An Exact Method for Analysis of Value-based Array Data Dependences

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Solving Alignment Using Elementary Linear Algebra

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Scanning Polyhedra without Do-loops

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Non-Uniform Dependences Partitioned by Recurrence Chains

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Computing the Transitive Closure of a Union of Affine Integer Tuple Relations

COCOA '09 Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction

Automatic privatization for parallel execution of loops

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Using free scheduling for programming graphic cards

Facing the Multicore-Challenge II
Free scheduling for statement instances of parameterized arbitrarily nested affine loops

Parallel Computing
Parallel partitioning for distributed systems using sequential assignment

Journal of Parallel and Distributed Computing
A direct method for optimal VLSI realization of deeply nested n-D loop problems

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic coarse-grained parallelization of program loops is of great importance for parallel computing systems. This paper presents the theory of Iteration Space Slicing aimed at extracting synchronization-free parallelism available in arbitrarily nested program loops. We demonstrate that Iteration Space Slicing algorithms permits for extracting more coarse-grained parallelism than that extracted by means of the Affine Transformation Framework provided that we are able to calculate the transitive closure of the union of relations describing all dependences in the affine loop. Experimental results show that by means of Iteration Space Slicing algorithms, we are able to extract coarse-grained parallelism for many loops of NAS and UTDSP benchmarks. Problems to be resolved in order to enhance the theory of Iteration Space Slicing are discussed.