An efficient algorithm for the run-time parallelization of DOACROSS loops

Authors:
Ding Kai Chen;Josep Torrellas;Pen Chung Yew
Affiliations:
University of Illinois at Urbana-Champaign, IL;University of Illinois at Urbana-Champaign, IL;University of Illinois at Urbana-Champaign, IL
Venue:
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Year:
1994

Citing 12
Cited 22

A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Loop skewing: the wavefront method revisited

International Journal of Parallel Programming
Compiler algorithms for synchronization

IEEE Transactions on Computers
An approach to synchronization for parallel computing

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Advanced loop optimizations for parallel computers

Proceedings of the 1st International Conference on Supercomputing
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The cedar system and an initial performance study

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Dependence Uniformization: A Loop Parallelization Technique

IEEE Transactions on Parallel and Distributed Systems
Multiprocessors: discussion of some theoretical and practical problems

Multiprocessors: discussion of some theoretical and practical problems
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
On Effective Execution of Nonuniform DOACROSS Loops

IEEE Transactions on Parallel and Distributed Systems
Coarse-grained speculative execution in shared-memory multiprocessors

ICS '98 Proceedings of the 12th international conference on Supercomputing
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

IEEE Transactions on Parallel and Distributed Systems
Coarse-Grained Thread Pipelining: A Speculative Parallel Execution Model for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Time-Stamping Algorithms for Parallelization of Loops at Run-Time

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Parameter-Induced Aliasing in Ada

Ada Europe '01 Proceedings of the 6th Ade-Europe International Conference Leuven on Reliable Software Technologies
Improving Locality in the Parallelization of Doacross Loops (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
New OpenMP directives for irregular data access loops

Scientific Programming
Sensitivity analysis for automatic parallelization on multi-cores

Proceedings of the 21st annual international conference on Supercomputing
Implementation of Sensitivity Analysis for Automatic Parallelization

Languages and Compilers for Parallel Computing
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Predecessor/successor approach for high-performance run-time wavefront scheduling

Information Sciences: an International Journal
An adaptive scheme for dynamic parallelization

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Transparent runtime parallelization of the R scripting language

Journal of Parallel and Distributed Computing
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

While automatic parallelization of loops usually relies on compile-time analysis of data dependences, for some loops the data dependences cannot be determined at compile time. An example is loops accessing arrays with subscripted subscripts. To parallelize these loops, it is necessary to perform run-time analysis. In this paper, we present a new algorithm to parallelize these loops at run time. Our scheme handles any type of data dependence in the loop without requiring any special architectural support in the multiprocessor. Furthermore, compared to an older scheme with the same generality, our scheme significantly reduces the amount of processor communication required and increases the overlap among dependent iterations. We evaluate our algorithm with parameterized loops running on the 32-processor Cedar shared-memory multiprocessor. The results show speedups over the serial code of up to 14 with the full overhead of run-time analysis and of up to 27 if part of the analysis is reused across loop invocations. Moreover, the algorithm outperforms the older scheme in nearly all cases, reaching speedups of up to times when the loop has many dependences.