Scans as Primitive Parallel Operations
IEEE Transactions on Computers
Parallelizing complex scans and reductions
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
ICS '94 Proceedings of the 8th international conference on Supercomputing
Parametric Analysis of Polyhedral Iteration Spaces
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
Journal of the ACM (JACM)
Detection of Recurrences in Sequential Programs with Loops
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Towards automatic parallelization of tree reductions in dynamic programming
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations
IEEE Transactions on Computers
Speculative parallelization of partial reduction variables
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Automatic parallelization via matrix multiplication
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
High-speed parallel Viterbi decoding: algorithm and VLSI-architecture
IEEE Communications Magazine
Hi-index | 0.00 |
Most automatic parallelizers are based on detection of independent computations, and most of them cannot do anything if there is a true dependence between computations. However, this can be surmounted for programs that perform prefix computations (scans). We present a method for automatically parallelizing such "inherently sequential" programs. Our method, which handles arbitrarily nested loops, identifies situations where the computation performed by the loop body is equivalent to a matrix vector product over a semi-ring. We also deal with mutually dependent variables in the loop. Our method is implemented in a polyhedral program transformation and code generation system and generates OpenMP code. We also present strategies to improve the performance of the generated code, an analytical performance model for the expected speedup, as well as a method to choose the parallelization parameters optimally. We show experimentally that the scan parallelizations performed by our system are effective, yielding linear (iso-efficient) speedup in situations where no other parallelism is available.