Speculative Parallelization of Partially Parallel Loops

Authors:
Francis H. Dang;Lawrence Rauchwerger
Affiliations:
-;-
Venue:
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Year:
2000

Citing 8
Cited 3

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Array privatization for parallel execution of loops

ICS '92 Proceedings of the 6th international conference on Supercomputing
A scalable method for run-time loop parallelization

International Journal of Parallel Programming
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Data Dependence and Data-Flow Analysis of Arrays

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments

Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing

Performance instrumentation and compiler optimizations for MPI/OpenMP applications

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
The HELIX project: overview and directions

Proceedings of the 49th Annual Design Automation Conference
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. We have previously proposed a framework for their identification. We speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor dependences; if the test failed, then the loop was re-executed serially. While this method exploits doall parallelism well, it can cause slowdowns for loops with even one cross-processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. In this paper we propose a generalization of our speculative doall parallelization technique, named Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the overhead of the run-time dependence test itself, i.e., removes the time lost due to incorrect parallel execution. The asymptotic time-complexity is, for fully serial loops, equal to the sequential execution time. We present the base algorithm and an analysis of the different heuristics for its practical application. Some preliminary experimental results on loops from Track will show the performance of this new technique.