IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Design Space Exploration of a Software Speculative Parallelization Scheme
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Computers
A lightweight in-place implementation for software thread-level speculation
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Probabilistic program analysis for parallelizing compilers
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Fastpath speculative parallelization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Speculative separation for privatization and reductions
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Financial software on GPUs: between Haskell and Fortran
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Optimizing software runtime systems for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Speculative parallelization: eliminating the overhead of failure
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.00 |
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. In our previously proposed framework we have speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor dependences; If the test failed, then the loop was re-executed serially. While this method exploits doall parallelism well, it can cause slowdowns for loops with even one cross-processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. We now propose a generalization of our speculative doall parallelization technique, called the Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the overhead of the run-time dependence test itself. In this paper we present the base algorithm and an analysis of the different heuristics for its practical application and a few experimental results on loops from Track, Spice, and FMA3D.