A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
IEEE Transactions on Software Engineering
Compiler algorithms for synchronization
IEEE Transactions on Computers
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Improving the performance of runtime parallelization
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
A scalable method for run-time loop parallelization
International Journal of Parallel Programming
On Effective Execution of Nonuniform DOACROSS Loops
IEEE Transactions on Parallel and Distributed Systems
On the Automatic Parallelization of the Perfect Benchmarks®
IEEE Transactions on Parallel and Distributed Systems
Cyclic Staggered Scheme: A Loop Allocation Policy for DOACROSS Loops
IEEE Transactions on Computers
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Run-time parallelization: its time has come
Parallel Computing - Special issues on languages and compilers for parallel computers
Nonlinear and Symbolic Data Dependence Testing
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Redundant Synchronization Elimination for DOACROSS Loops
IEEE Transactions on Parallel and Distributed Systems
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
The Superthreaded Processor Architecture
IEEE Transactions on Computers
Techniques for speculative run-time parallelization of loops
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Load Balancing in Parallel Computers: Theory and Practice
Load Balancing in Parallel Computers: Theory and Practice
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
An efficient algorithm for the run-time parallelization of DOACROSS loops
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Parallelizing Molecular Dynamics Programs for Distributed-Memory Machines
IEEE Computational Science & Engineering
Dependence Uniformization: A Loop Parallelization Technique
IEEE Transactions on Parallel and Distributed Systems
A memory-layout oriented run-time technique for locality optimization
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Effects of Parallelism Degree on Run-Time Parallelization of Loops
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
A Loop Allocation Policy for DOACROSS Loops
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Improving Locality in the Parallelization of Doacross Loops (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Towards Detection of Coarse-Grain Loop-Level Parallelism in Irregular Computations
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A GSA-based compiler infrastructure to extract parallelism from complex loops
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
An inspector-executor algorithm for irregular assignment parallelization
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
HELIX: automatic parallelization of irregular programs for chip multiprocessing
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
This paper presents a time stamp algorithm for runtime parallelization of general DOACROSS loops that have indirect access patterns. The algorithm follows the INSPECTOR/EXECUTOR scheme and exploits parallelism at a fine-grained memory reference level. It features a parallel inspector and improves upon previous algorithms of the same generality by exploiting parallelism among consecutive reads of the same memory element. Two variants of the algorithm are considered: One allows partially concurrent reads (PCR) and the other allows fully concurrent reads (FCR). Analyses of their time complexities derive a necessary condition with respect to the iteration workload for runtime parallelization. Experimental results for a Gaussian elimination loop, as well as an extensive set of synthetic loops on a 12-way SMP server, show that the time stamp algorithms outperform iteration-level parallelization techniques in most test cases and gain speedups over sequential execution for loops that have heavy iteration workloads. The PCR algorithm performs best because it makes a better trade-off between maximizing the parallelism and minimizing the analysis overhead. For loops with light or unknown iteration loads, an alternative speculative runtime parallelization technique is preferred.