A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

Authors:
Chuan-Qi Zhu;Pen-Chung Yew
Affiliations:
Univ. of Illinois at Urbana-Champaign, Urbana;Univ. of Illinois at Urbana-Champaign, Urbana
Venue:
IEEE Transactions on Software Engineering
Year:
1987

Citing 0
Cited 47

Compiler algorithms for synchronization

IEEE Transactions on Computers
Efficient synchronization of multiprocessors with shared memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
An approach to synchronization for parallel computing

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Impact of self-scheduling order on performance on multiprocessor systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
On data synchronization for multiprocessors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Efficient Doacross execution on distributed shared-memory multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A scheme to extract run-time parallelism form sequential loops

ICS '91 Proceedings of the 5th international conference on Supercomputing
Compiler algorithms for event variable synchronization

ICS '91 Proceedings of the 5th international conference on Supercomputing
Removal of redundant dependences in DOACROSS loops with constant dependences

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
Experiments with an ocean circulation model on CEDAR

ICS '92 Proceedings of the 6th international conference on Supercomputing
The cedar system and an initial performance study

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Advanced compiler optimizations for sparse computations

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization

ICS '94 Proceedings of the 8th international conference on Supercomputing
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiler techniques for data synchronization in nested parallel loops

ICS '90 Proceedings of the 4th international conference on Supercomputing
Coarse-grained speculative execution in shared-memory multiprocessors

ICS '98 Proceedings of the 12th international conference on Supercomputing
The Cedar system and an initial performance study

25 years of the international symposia on Computer architecture (selected papers)
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
The impact of synchronization and granularity on parallel systems

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

IEEE Transactions on Parallel and Distributed Systems
Automatic array privatization

Compiler optimizations for scalable parallel systems
Coarse-Grained Thread Pipelining: A Speculative Parallel Execution Model for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
An efficient algorithm for the run-time parallelization of DOACROSS loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
An Empirical Study of Fortran Programs for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs

IEEE Transactions on Parallel and Distributed Systems
Run-time data-flow analysis

Journal of Computer Science and Technology
Time-Stamping Algorithms for Parallelization of Loops at Run-Time

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Principles of Speculative Run-Time Parallelization

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Improving Locality in the Parallelization of Doacross Loops (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Techniques for Reducing the Overhead of Run-Time Parallelization

CC '00 Proceedings of the 9th International Conference on Compiler Construction
An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
An Efficient Technique of Instruction Scheduling on a Superscalar-Based Mulprocessor

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Predicting locality phases for dynamic memory optimization

Journal of Parallel and Distributed Computing
The Fortran parallel transformer and its programming environment

Information Sciences: an International Journal
Predecessor/successor approach for high-performance run-time wavefront scheduling

Information Sciences: an International Journal
Parallelization of utility programs based on behavior phase analysis

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Speculative parallelization: eliminating the overhead of failure

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.01

Visualization

Abstract

Enforcement of data dependence in parallel algorithms requires certain synchronization primitives. For simple data dependence, synchronization primitives like Full/Empty bit in HEP machine [5] can be very effective. However, if data dependence cannot be determined at compile time, or if very complicated, more efficient synchronization schemes and algorithms are needed.