The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

Authors:
Lawrence Rauchwerger;David Padua
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Year:
1995

Citing 26
Cited 82

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Compiler algorithms for synchronization

IEEE Transactions on Computers
An approach to synchronization for parallel computing

ICS '88 Proceedings of the 2nd international conference on Supercomputing
On-the-fly detection of access anomalies

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
An empirical comparison of monitoring algorithms for access anomaly detection

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Array privatization for parallel execution of loops

ICS '92 Proceedings of the 6th international conference on Supercomputing
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Massively parallel methods for engineering and science problems

Communications of the ACM
The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization

ICS '94 Proceedings of the 8th international conference on Supercomputing
The doconsider loop

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient algorithm for the run-time parallelization of DOACROSS loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs

IEEE Transactions on Parallel and Distributed Systems
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Data Dependence and Data-Flow Analysis of Arrays

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Parallelizing while loops for multiprocessor systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Lsi circuit simulation on vector computers (spice2, classie)

Lsi circuit simulation on vector computers (spice2, classie)

Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
Static and Dynamic Evaluation of Data Dependence Analysis Techniques

IEEE Transactions on Parallel and Distributed Systems
Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Array SSA form and its use in parallelization

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Predicated array data-flow analysis for run-time parallelization

ICS '98 Proceedings of the 12th international conference on Supercomputing
Measuring the effectiveness of automatic parallelization in SUIF

ICS '98 Proceedings of the 12th international conference on Supercomputing
Constraint-based array dependence analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Evaluation of predicated array data-flow analysis for automatic parallelization

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

ACM Transactions on Computer Systems (TOCS)
Statically Safe Speculative Execution for Real-Time Systems

IEEE Transactions on Software Engineering
Evaluating Automatic Parallelization in SUIF

IEEE Transactions on Parallel and Distributed Systems
An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines

The Journal of Supercomputing
A framework for remote dynamic program optimization

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Removing architectural bottlenecks to the scalability of speculative parallelization

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Reference idempotency analysis: a framework for optimizing speculative execution

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
High-level adaptive program optimization with ADAPT

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Improving parallel irregular reductions using partial array expansion

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Containers on the Parallelization of General-Purpose Java Programs

International Journal of Parallel Programming
Parallel Programming with Polaris

Computer
Time-Stamping Algorithms for Parallelization of Loops at Run-Time

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Principles of Speculative Run-Time Parallelization

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
A Case for Combining Compile-Time and Run-Time Parallelization

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Toward efficient and robust software speculative parallelization on multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
ADAPT: Automated De-Coupled Adaptive Program Transformation

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
Design Space Exploration of a Software Speculative Parallelization Scheme

IEEE Transactions on Parallel and Distributed Systems
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Characterization of TCC on Chip-Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic parallelization and mapping of binary executables on hierarchical platforms

Proceedings of the 3rd conference on Computing frontiers
Exploiting reference idempotency to reduce speculative storage overflow

ACM Transactions on Programming Languages and Systems (TOPLAS)
On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings

Proceedings of the 20th annual international conference on Supercomputing
Speculative thread decomposition through empirical optimization

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Combining compile-time and run-time parallelization[1]

Scientific Programming
Software behavior oriented parallelization

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
New Scheduling Strategies for Randomized Incremental Algorithms in the Context of Speculative Parallelization

IEEE Transactions on Computers
Predicting locality phases for dynamic memory optimization

Journal of Parallel and Distributed Computing
Sensitivity analysis for automatic parallelization on multi-cores

Proceedings of the 21st annual international conference on Supercomputing
The potential of trace-level parallelism in Java programs

Proceedings of the 5th international symposium on Principles and practice of programming in Java
Compiler-Driven Dependence Profiling to Guide Program Parallelization

Languages and Compilers for Parallel Computing
Implementation of Sensitivity Analysis for Automatic Parallelization

Languages and Compilers for Parallel Computing
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A study of potential parallelism among traces in Java programs

Science of Computer Programming
Fast Track: A Software System for Speculative Program Optimization

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
The use of hardware transactional memory for the trace-based parallelization of recursive Java programs

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Can transactions enhance parallel programs?

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
A compiler approach to performance prediction using empirical-based modeling

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Balanced, locality-based parallel irregular reductions

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
The structure of a compiler for explicit and implicit parallelism

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Improving speculative loop parallelization via selective squash and speculation reuse

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
NDSeq: runtime checking for nondeterministic sequential specifications of parallel correctness

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Kremlin: rethinking and rebooting gprof for the multicore age

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
ALTER: exploiting breakable dependences for parallelization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Exploiting the commutativity lattice

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Exclusive squashing for thread-level speculation

Proceedings of the 20th international symposium on High performance distributed computing
Safe parallel programming using dynamic dependence hints

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Parallelization of utility programs based on behavior phase analysis

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Cooperative parallelization

Proceedings of the International Conference on Computer-Aided Design
Probabilistic program analysis for parallelizing compilers

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
An evaluation of auto-scoping in OpenMP

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Adapting the polyhedral model as a framework for efficient speculative parallelization

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Automatically tuning parallel and parallelized programs

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
The polyhedral model is more widely applicable than you think

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Speculative separation for privatization and reductions

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Dynamic trace-based analysis of vectorization potential of applications

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
HydraVM: extracting parallelism from legacy sequential code using STM

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Code generation for parallel execution of a class of irregular loops on distributed memory systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimizing software runtime systems for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Automatic speculative parallelization of loops using polyhedral dependence analysis

Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
General data structure expansion for multi-threading

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Parallelizing Sequential Programs with Statistical Accuracy Tests

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Computational caches

Proceedings of the 6th International Systems and Storage Conference
Vectorization past dependent branches through speculation

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Non-affine Extensions to Polyhedral Code Generation

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is re-executed serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run-time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching; it detects at run-time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods.