The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization

Authors:
Lawrence Rauchwerger;David Padua
Affiliations:
Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 1308 W. Main St., Urbana, IL;Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 1308 W. Main St., Urbana, IL
Venue:
ICS '94 Proceedings of the 8th international conference on Supercomputing
Year:
1994

Citing 20
Cited 23

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Compiler algorithms for synchronization

IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
On-the-fly detection of access anomalies

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Cedar Fortran and other vector and parallel Fortran dialects

The Journal of Supercomputing
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Array privatization for parallel execution of loops

ICS '92 Proceedings of the 6th international conference on Supercomputing
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The doconsider loop

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Detecting Nondeterminacy in Parallel Programs

IEEE Software
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs

IEEE Transactions on Parallel and Distributed Systems
Data Dependence and Data-Flow Analysis of Arrays

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Design of a Computer—The Control Data 6600

Design of a Computer—The Control Data 6600

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
Coarse-grained speculative execution in shared-memory multiprocessors

ICS '98 Proceedings of the 12th international conference on Supercomputing
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
High-level adaptive program optimization with ADAPT

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Coarse-Grained Thread Pipelining: A Speculative Parallel Execution Model for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Cost effective memory disambiguation for multimedia codes

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models

International Journal of Parallel Programming
Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing

IEEE Parallel & Distributed Technology: Systems & Technology
Run-time data-flow analysis

Journal of Computer Science and Technology
A Correction Method for Parallel Loop Execution

ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Principles of Speculative Run-Time Parallelization

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
A Technique for Parallel Loop Execution

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
ADAPT: Automated De-Coupled Adaptive Program Transformation

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications

ACM Transactions on Architecture and Code Optimization (TACO)
The structure of a compiler for explicit and implicit parallelism

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
ALTER: exploiting breakable dependences for parallelization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
An evaluation of auto-scoping in OpenMP

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Speculative separation for privatization and reductions

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Optimizing software runtime systems for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
General data structure expansion for multi-threading

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Current parallelizing compilers cannot identify a significant fraction of fully parallel loops because they have complex or statically insufficiently defined access patterns. For this reason, we have developed the Privatizing DOALL test—a technique for identifying fully parallel loops at run-time, and dynamically privatizing scalars and arrays. The test itself is fully parallel, and can be applied to any loop, regardless of the structure of its data and/or control flow. The technique can be utilized in two modes: (i) the test is performed before executing the loop and indicates whether the loop can be executed as a DOALL; (ii) speculatively—the loop and the test are executed simultaneously, and it is determined later if the loop was in fact parallel. The test can also be used for debugging parallel programs. We discuss how the test can be inserted automatically by the compiler and outline a cost/performance analysis that can be performed to decide when to use the test. Our conclusion is that the test should almost always be applied—because, as we show, the expected speedup for fully parallel loops is significant, and the cost of a failed test (a not fully parallel loop), is minimal. We present some experimental results on loops from the PERFECT Benchmarks which confirm our conclusion that this test can lead to significant speedups.