An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems

Authors:
Tsung-Chuan Huang; Chi-Fan;Po-Hsueh Hsu
Affiliations:
-;-;-
Venue:
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Year:
2000

Citing 10
Cited 0

A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Compiler algorithms for synchronization

IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable method for run-time loop parallelization

International Journal of Parallel Programming
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
An efficient algorithm for the run-time parallelization of DOACROSS loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Time-Stamping Algorithms for Parallelization of Loops at Run-Time

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The SPNT Test: A New Technology for Run-Time Speculative Parallelization of Loops

LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance computing capability is crucial for the advanced calculations of scientific applications. A parallelizing compiler can take a sequential program as input and automatically translate it into a parallel form. But for loops with arrays of irregular (i.e., indirectly indexed), nonlinear or dynamic access patterns, no state-of-the-art compilers can determine their parallelism at compile-time. In this paper, we propose an efficient run-time scheme to compute a high parallelism execution schedule for those loops. This new scheme first constructs a predecessor iteration table in inspector phase, and then schedules the whole loop iterations into wavefronts for parallel execution. For non-uniform access patterns, the performance of the inspector/executor methods usually degrades dramatically, but it is not valid for our scheme. Furthermore, this scheme is especially suitable for multiprocessor systems because of the features of high scalability and low overhead.