Compiler techniques for data synchronization in nested parallel loops

Authors:
Peiyi Tang;Pen-Chung Yew;Chuan-Qi Zhu
Affiliations:
Department of Computer Science, The Australian National University, Canberra, ACT 2601, Australia;Center for Supercomputing, Research and Development, University of Illinois at U-C, Urbana, IL;Computer Center, Fudan University, Shanghai, China
Venue:
ICS '90 Proceedings of the 4th international conference on Supercomputing
Year:
1990

Citing 13
Cited 10

The architecture of HEP

on Parallel MIMD computation: HEP supercomputer and its applications
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Compiler algorithms for synchronization

IEEE Transactions on Computers
Automatic decomposition of scientific programs for parallel execution

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
An approach to synchronization for parallel computing

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The future of high performance computers in science and engineering

Communications of the ACM - Special issue: multiprocessing
On data synchronization for multiprocessors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
Data dependence analysis on multi-dimensional array references

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Multiprocessor Synchronization for Concurrent Loops

IEEE Software

Efficient Doacross execution on distributed shared-memory multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A scheme to extract run-time parallelism form sequential loops

ICS '91 Proceedings of the 5th international conference on Supercomputing
Compiler algorithms for event variable synchronization

ICS '91 Proceedings of the 5th international conference on Supercomputing
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Static analysis to reduce synchronization costs in data-parallel programs

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The role of associativity and commutativity in the detection and transformation of loop-level parallelism

ICS '98 Proceedings of the 12th international conference on Supercomputing
Compile Time Barrier Synchronization Minimization

IEEE Transactions on Parallel and Distributed Systems
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs

International Journal of Parallel Programming
Compile-time Synchronization Optimizations for Software DSMs

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A practical approach to DOACROSS parallelization

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The major source of parallelism in ordinary programs is do loops. When loop iterations of parallelized loops are executed on multiprocessors, the cross-iteration data dependencies need to be enforced by synchronization between processors. Existing data synchronization schemes are either too simple to handle general nested loop structures with non-trivia array subscript functions or inefficient due to the large run-time overhead.In this paper, we propose a new synchronization scheme based on two data-oriented synchronization instructions: synch_read(x,s) and synch_write(x,s). We present the algorithm to compute the ordering number, s, for each data access. Using our scheme, a parallelizing compiler can parallelize a general nested loop structure with complicated cross-iteration data dependencies. If the computations of ordering numbers cannot be done at compile time, the run-time overhead is smaller than the other existing run-time schemes.