Loop skewing: the wavefront method revisited
International Journal of Parallel Programming
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding
IEEE Transactions on Computers
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
The parallel execution of DO loops
Communications of the ACM
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
An Efficient Algorithm for Optimal Loop Parallelization
SIGAL '90 Proceedings of the International Symposium on Algorithms
Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
Automatic Parallelization and Optimization of Programs by Proof Rewriting
SAS '09 Proceedings of the 16th International Symposium on Static Analysis
Optimizing scheduling and intercluster connection for application-specific DSP processors
IEEE Transactions on Signal Processing
Hi-index | 0.00 |
Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where all iterations in a stripe are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50 and 54% respectively.