The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

Authors:
Francis H. Dang;Hao Yu;Lawrence Rauchwerger
Affiliations:
-;-;-
Venue:
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Year:
2002

Citing 16
Cited 12

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Array privatization for parallel execution of loops

ICS '92 Proceedings of the 6th international conference on Supercomputing
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable method for run-time loop parallelization

International Journal of Parallel Programming
Coarse-grained speculative execution in shared-memory multiprocessors

ICS '98 Proceedings of the 12th international conference on Supercomputing
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Parallel Programming with Polaris

Computer
Data Dependence and Data-Flow Analysis of Arrays

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments

Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Techniques for Reducing the Overhead of Run-Time Parallelization

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Parallelizing while loops for multiprocessor systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing

Toward efficient and robust software speculative parallelization on multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hybrid analysis: static & dynamic memory reference analysis

International Journal of Parallel Programming
PEAK—a fast and effective performance tuning system via compiler optimization orchestration

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel-stage decoupled software pipelining

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Spice: speculative parallel iteration chunk execution

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Software thread-level speculation: an optimistic library implementation

Proceedings of the 1st international workshop on Multicore software engineering
Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS)

Languages and Compilers for Parallel Computing
Decoupled software pipelining creates parallelization opportunities

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Exclusive squashing for thread-level speculation

Proceedings of the 20th international symposium on High performance distributed computing
Cooperative parallelization

Proceedings of the International Conference on Computer-Aided Design
Support for thread-level speculation into OpenMP

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
General data structure expansion for multi-threading

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. In our previously proposed framework we have speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor dependences; If the test failed, then the loop was re-executed serially. While this method exploits doall parallelism well, it can cause slowdowns for loops with even one cross-processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. We now propose a generalization of our speculative doall parallelization technique, called the Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the overhead of the runtime dependence test itself. In this paper we present the base algorithm and an analysis of the different heuristics for its practical application and a few experimental results on loops from Track, Spice, and FMA3D.