Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

Authors:
Cheng-Zhong Xu;Vipin Chaudhary
Affiliations:
Wayne State Univ., Detroit, MI;Wayne State Univ., Detroit, MI
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2001

Citing 28
Cited 5

A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Compiler algorithms for synchronization

IEEE Transactions on Computers
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
A scalable method for run-time loop parallelization

International Journal of Parallel Programming
On Effective Execution of Nonuniform DOACROSS Loops

IEEE Transactions on Parallel and Distributed Systems
On the Automatic Parallelization of the Perfect Benchmarks®

IEEE Transactions on Parallel and Distributed Systems
Cyclic Staggered Scheme: A Loop Allocation Policy for DOACROSS Loops

IEEE Transactions on Computers
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Run-time parallelization: its time has come

Parallel Computing - Special issues on languages and compilers for parallel computers
Nonlinear and Symbolic Data Dependence Testing

IEEE Transactions on Parallel and Distributed Systems
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Redundant Synchronization Elimination for DOACROSS Loops

IEEE Transactions on Parallel and Distributed Systems
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Load Balancing in Parallel Computers: Theory and Practice

Load Balancing in Parallel Computers: Theory and Practice
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
An efficient algorithm for the run-time parallelization of DOACROSS loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Parallelizing Molecular Dynamics Programs for Distributed-Memory Machines

IEEE Computational Science & Engineering
Dependence Uniformization: A Loop Parallelization Technique

IEEE Transactions on Parallel and Distributed Systems
A memory-layout oriented run-time technique for locality optimization

ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Effects of Parallelism Degree on Run-Time Parallelization of Loops

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
A Loop Allocation Policy for DOACROSS Loops

SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor

Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor

Improving Locality in the Parallelization of Doacross Loops (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Towards Detection of Coarse-Grain Loop-Level Parallelism in Irregular Computations

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A GSA-based compiler infrastructure to extract parallelism from complex loops

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
An inspector-executor algorithm for irregular assignment parallelization

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a time stamp algorithm for runtime parallelization of general DOACROSS loops that have indirect access patterns. The algorithm follows the INSPECTOR/EXECUTOR scheme and exploits parallelism at a fine-grained memory reference level. It features a parallel inspector and improves upon previous algorithms of the same generality by exploiting parallelism among consecutive reads of the same memory element. Two variants of the algorithm are considered: One allows partially concurrent reads (PCR) and the other allows fully concurrent reads (FCR). Analyses of their time complexities derive a necessary condition with respect to the iteration workload for runtime parallelization. Experimental results for a Gaussian elimination loop, as well as an extensive set of synthetic loops on a 12-way SMP server, show that the time stamp algorithms outperform iteration-level parallelization techniques in most test cases and gain speedups over sequential execution for loops that have heavy iteration workloads. The PCR algorithm performs best because it makes a better trade-off between maximizing the parallelism and minimizing the analysis overhead. For loops with light or unknown iteration loads, an alternative speculative runtime parallelization technique is preferred.