Scan detection and parallelization in "inherently sequential" nested loop programs

Authors:
Yun Zou;Sanjay Rajopadhye
Affiliations:
Colorado State University;Colorado State University
Venue:
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Year:
2012

Citing 17
Cited 0

Scans as Primitive Parallel Operations

IEEE Transactions on Computers
Parallelizing complex scans and reductions

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Scheduling reductions

ICS '94 Proceedings of the 8th international conference on Supercomputing
Historical development of the Newton-Raphson method

SIAM Review
Parametric Analysis of Polyhedral Iteration Spaces

Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
The Organization of Computations for Uniform Recurrence Equations

Journal of the ACM (JACM)
Parallel Prefix Computation

Journal of the ACM (JACM)
Detection of Recurrences in Sequential Programs with Loops

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Automatic Parallelization in the Polytope Model

The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Towards automatic parallelization of tree reductions in dynamic programming

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations

IEEE Transactions on Computers
Speculative parallelization of partial reduction variables

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Automatic parallelization via matrix multiplication

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
High-speed parallel Viterbi decoding: algorithm and VLSI-architecture

IEEE Communications Magazine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most automatic parallelizers are based on detection of independent computations, and most of them cannot do anything if there is a true dependence between computations. However, this can be surmounted for programs that perform prefix computations (scans). We present a method for automatically parallelizing such "inherently sequential" programs. Our method, which handles arbitrarily nested loops, identifies situations where the computation performed by the loop body is equivalent to a matrix vector product over a semi-ring. We also deal with mutually dependent variables in the loop. Our method is implemented in a polyhedral program transformation and code generation system and generates OpenMP code. We also present strategies to improve the performance of the generated code, an analytical performance model for the expected speedup, as well as a method to choose the parallelization parameters optimally. We show experimentally that the scan parallelizations performed by our system are effective, yielding linear (iso-efficient) speedup in situations where no other parallelism is available.