A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
IEEE Transactions on Software Engineering
Loop skewing: the wavefront method revisited
International Journal of Parallel Programming
Compiler algorithms for synchronization
IEEE Transactions on Computers
An approach to synchronization for parallel computing
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Advanced loop optimizations for parallel computers
Proceedings of the 1st International Conference on Supercomputing
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Improving the performance of runtime parallelization
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The cedar system and an initial performance study
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Dependence Uniformization: A Loop Parallelization Technique
IEEE Transactions on Parallel and Distributed Systems
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Run-time methods for parallelizing partially parallel loops
ICS '95 Proceedings of the 9th international conference on Supercomputing
On Effective Execution of Nonuniform DOACROSS Loops
IEEE Transactions on Parallel and Distributed Systems
Coarse-grained speculative execution in shared-memory multiprocessors
ICS '98 Proceedings of the 12th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Techniques for speculative run-time parallelization of loops
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Time-Stamping Algorithms for Parallelization of Loops at Run-Time
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Parameter-Induced Aliasing in Ada
Ada Europe '01 Proceedings of the 6th Ade-Europe International Conference Leuven on Reliable Software Technologies
Improving Locality in the Parallelization of Doacross Loops (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
New OpenMP directives for irregular data access loops
Scientific Programming
Sensitivity analysis for automatic parallelization on multi-cores
Proceedings of the 21st annual international conference on Supercomputing
Implementation of Sensitivity Analysis for Automatic Parallelization
Languages and Compilers for Parallel Computing
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Predecessor/successor approach for high-performance run-time wavefront scheduling
Information Sciences: an International Journal
An adaptive scheme for dynamic parallelization
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Transparent runtime parallelization of the R scripting language
Journal of Parallel and Distributed Computing
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
While automatic parallelization of loops usually relies on compile-time analysis of data dependences, for some loops the data dependences cannot be determined at compile time. An example is loops accessing arrays with subscripted subscripts. To parallelize these loops, it is necessary to perform run-time analysis. In this paper, we present a new algorithm to parallelize these loops at run time. Our scheme handles any type of data dependence in the loop without requiring any special architectural support in the multiprocessor. Furthermore, compared to an older scheme with the same generality, our scheme significantly reduces the amount of processor communication required and increases the overlap among dependent iterations. We evaluate our algorithm with parameterized loops running on the 32-processor Cedar shared-memory multiprocessor. The results show speedups over the serial code of up to 14 with the full overhead of run-time analysis and of up to 27 if part of the analysis is reused across loop invocations. Moreover, the algorithm outperforms the older scheme in nearly all cases, reaching speedups of up to times when the loop has many dependences.