The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

Authors:
Lawrence Rauchwerger;David A. Padua
Affiliations:
Texas A&M Univ., College Station;Univ. of Illinois at Urbana-Champaign, Urbana
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1999

Citing 35
Cited 54

Virtual time

ACM Transactions on Programming Languages and Systems (TOPLAS)
Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Compiler algorithms for synchronization

IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
An approach to synchronization for parallel computing

ICS '88 Proceedings of the 2nd international conference on Supercomputing
On-the-fly detection of access anomalies

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
An empirical comparison of monitoring algorithms for access anomaly detection

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
On-the-fly detection of data races for programs with nested fork-join parallelism

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Array privatization for parallel execution of loops

ICS '92 Proceedings of the 6th international conference on Supercomputing
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compile-time support for efficient data race detection in shared-memory parallel programs

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Massively parallel methods for engineering and science problems

Communications of the ACM
The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization

ICS '94 Proceedings of the 8th international conference on Supercomputing
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A scalable method for run-time loop parallelization

International Journal of Parallel Programming
The doconsider loop

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient algorithm for the run-time parallelization of DOACROSS loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Detecting Nondeterminacy in Parallel Programs

IEEE Software
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs

IEEE Transactions on Parallel and Distributed Systems
Time-Stamping Algorithms for Parallelization of Loops at Run-Time

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Data Dependence and Data-Flow Analysis of Arrays

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Parallelizing while loops for multiprocessor systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Effects of Parallelism Degree on Run-Time Parallelization of Loops

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture

Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

IEEE Transactions on Parallel and Distributed Systems
Fractal symbolic analysis

ICS '01 Proceedings of the 15th international conference on Supercomputing
Hybrid analysis: static & dynamic memory reference analysis

ICS '02 Proceedings of the 16th international conference on Supercomputing
SmartApps: An Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Run-Time Parallelization Optimization Techniques

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
SmartApps: An Application Centric Approach to High Performance Computing

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Irregular Assignment Computations on cc-NUMA Multiprocessors

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Techniques for Reducing the Overhead of Run-Time Parallelization

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Speculative Parallelization of Partially Parallel Loops

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Fractal symbolic analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Hybrid analysis: static & dynamic memory reference analysis

International Journal of Parallel Programming
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Look left, look right, look left again: an application of fractal symbolic analysis to linear algebra code restructuring

International Journal of Parallel Programming
SmartApps: middle-ware for adaptive applications on reconfigurable platforms

ACM SIGOPS Operating Systems Review
In search of a program generator to implement generic transformations for high-performance computing

Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

Journal of Parallel and Distributed Computing
Optimistic parallelism benefits from data partitioning

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
PEAK—a fast and effective performance tuning system via compiler optimization orchestration

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel-stage decoupled software pipelining

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Spice: speculative parallel iteration chunk execution

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Scheduling strategies for optimistic parallel execution of irregular programs

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
On the Scalability of an Automatically Parallelized Irregular Application

Languages and Compilers for Parallel Computing
Mostly static program partitioning of binary executables

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimistic parallelism requires abstractions

Communications of the ACM - The Status of the P versus NP Problem
A lightweight in-place implementation for software thread-level speculation

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Predecessor/successor approach for high-performance run-time wavefront scheduling

Information Sciences: an International Journal
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An adaptive scheme for dynamic parallelization

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Decoupled software pipelining creates parallelization opportunities

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Speculative parallelization of partial reduction variables

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Speculative parallelization using state separation and multiple value prediction

Proceedings of the 2010 international symposium on Memory management
Towards a science of parallel programming

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A shape analysis for optimizing parallel graph programs

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Enhanced speculative parallelization via incremental recovery

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
The tao of parallelism in algorithms

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Exploiting coarse-grain speculative parallelism

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Loop selection for thread-level speculation

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
An inspector-executor algorithm for irregular assignment parallelization

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Automatic parallelization using the value evolution graph

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
The parawise expert assistant – widening accessibility to efficient and scalable tool generated OpenMP code

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Fastpath speculative parallelization

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Dynamically accelerating client-side web applications through decoupled execution

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Logical inference techniques for loop parallelization

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Speculative parallelization: eliminating the overhead of failure

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Fast condensation of the program dependence graph

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Leveraging GPUs using cooperative loop speculation

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall and apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is reexecuted serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run-time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: It detects at run-time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks, which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods