Transformations techniques for extracting parallelism in non-uniform nested loops

Authors:
Fawzy A. Torkey;Afaf A. Salah;Nahed M. El Desouky;Sahar A. Gomaa
Affiliations:
Kaferelsheikh University, Kaferelsheikh, Egypt;Mathematics Department, Faculty of Science, Al Azhar University, Nasr City, Egypt;Mathematics Department, Faculty of Science, Al Azhar University, Nasr City, Egypt;Mathematics Department, Faculty of Science, Al Azhar University, Nasr City, Egypt
Venue:
WSEAS Transactions on Computers
Year:
2008

Citing 28
Cited 1

Theory of linear and integer programming

Theory of linear and integer programming
A practical algorithm for exact array dependence analysis

Communications of the ACM
Lazy array data-flow dependence analysis

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications

IEEE Transactions on Parallel and Distributed Systems
On Effective Execution of Nonuniform DOACROSS Loops

IEEE Transactions on Parallel and Distributed Systems
Static and Dynamic Evaluation of Data Dependence Analysis Techniques

IEEE Transactions on Parallel and Distributed Systems
Loop Transformations for Fault Detection in Regular Loops on Massively Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Unimodular transformations of non-perfectly nested loops

Parallel Computing
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Maximizing parallelism and minimizing synchronization with affine partitions

Parallel Computing - Special issues on languages and compilers for parallel computers
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Dependence Analysis

Dependence Analysis
Loop Parallelization

Loop Parallelization
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
An Efficient Data Dependence Analysis for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
The Power Test for Data Dependence

IEEE Transactions on Parallel and Distributed Systems
Dependence Uniformization: A Loop Parallelization Technique

IEEE Transactions on Parallel and Distributed Systems
The Direction Vector I Test

IEEE Transactions on Parallel and Distributed Systems
Generalizing the Unimodular Approach

Proceedings of the 1994 International Conference on Parallel and Distributed Systems
Run-time parallelization for partially parallel loops

ICPADS '97 Proceedings of the 1997 International Conference on Parallel and Distributed Systems
Effects of Parallelism Degree on Run-Time Parallelization of Loops

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
Obtaining Affine Transformations to Improve Locality of Loop Nests

Programming and Computing Software
An efficient code generation technique for tiled iteration spaces

IEEE Transactions on Parallel and Distributed Systems

Using parallel signal processing in real-time audio matrix systems

WSEAS Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the performance by reducing the time needed for executing the program. Parallelizing and partitioning of nested loops requires efficient iteration dependence analysis. Although many loop transformations techniques exist for nested loop partitioning, most of these transformation techniques perform poorly when parallelizing nested loops with non-uniform (irregular) dependences. In this paper the affine and unimodular transformations are applied to solve the problem of parallelism in nested loops with non-uniform dependence vectors. To solve these problem few researchers converted the non-uniform nested loops to uniform nested loops and then find the parallelism. We propose applying directly the two approaches affine and unimodular transformations to extract and improve the parallelism in nested loops with non-uniform dependences. The study shows that unimodular transformation is better than affine transformation when the dependences in nested loops exist only in one statement. While affine transformation is more effective when the nested loops have a sequence of statements and the dependence exists between these different statements.