On Effective Execution of Nonuniform DOACROSS Loops

Authors:
Ding-Kai Chen;Pen-Chung Yew
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1996

Citing 16
Cited 7

Computational geometry: an introduction

Computational geometry: an introduction
Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Compiler algorithms for synchronization

IEEE Transactions on Computers
Automatic decomposition of scientific programs for parallel execution

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
On data synchronization for multiprocessors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Efficient Doacross execution on distributed shared-memory multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler optimizations for parallel loops with fine-grained synchronization

Compiler optimizations for parallel loops with fine-grained synchronization
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The parallel execution of DO loops

Communications of the ACM
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
An efficient algorithm for the run-time parallelization of DOACROSS loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
An Empirical Study of Fortran Programs for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Dependence Uniformization: A Loop Parallelization Technique

IEEE Transactions on Parallel and Distributed Systems
Parallelism exposure and exploitation in programs

Parallelism exposure and exploitation in programs
Multiprocessors: discussion of some theoretical and practical problems

Multiprocessors: discussion of some theoretical and practical problems
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers

FASTEST: A Practical Low-Complexity Algorithm for Compile-Time Assignment of Parallel Programs to Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

IEEE Transactions on Parallel and Distributed Systems
Transformations techniques for extracting parallelism in non-uniform nested loops

WSEAS Transactions on Computers
Affine and unimodular transformations for non-uniform nested loops

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is extremely difficult to parallelize DOACROSS loops with nonuniform loop-carried dependences. In this paper, we present a static scheduling scheme with an accompanying synchronization strategy that can execute such DOACROSS loops effectively and efficiently. Our approach uses one of the parallelization techniques called Dependence Uniformization, which finds a small set of uniform dependence vectors to cover all possible nonuniform dependences in a DOACROSS loop. It differs from the previous schemes in that we demonstrate a better way to select the uniform dependence vectors. When used with the Static Strip Scheduling scheme, the proposed uniform dependence vector set allows us to enforce dependences with more locality, which reduces the requirement of explicit synchronization considerably while retaining most of the parallelism. This paper describes the uniform dependence vectors selection strategy and the static strip scheduling scheme. The performance analysis and examples are also presented.