Run-Time Parallelization and Scheduling of Loops

Authors:
Joel H. Salz;Ravi Mirchandaney;Kay Crowley
Affiliations:
ICASE NASA Langely Research Center, Hampton, VA;Yale Univ., New Haven, CT;Yale Univ., New Haven, CT
Venue:
IEEE Transactions on Computers
Year:
1991

Citing 13
Cited 81

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Automatic decomposition of scientific programs for parallel execution

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Principles of runtime support for parallel processors

ICS '88 Proceedings of the 2nd international conference on Supercomputing
On the problem of optimizing data transfers for complex memory systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
An approach to synchronization for parallel computing

ICS '88 Proceedings of the 2nd international conference on Supercomputing
An experimental study of methods for parallel preconditioned Krylov methods

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Run-time parallelization and scheduling of loops

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Aggregation methods for solving sparse triangular systems on multiprocessors

SIAM Journal on Scientific and Statistical Computing
Delay point schedules for irregular parallel computations

International Journal of Parallel Programming
Supporting shared data structures on distributed memory architectures

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Performance measurements on HEP - a pipelined MIMD computer

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture

Run-time parallelization and scheduling of loops

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compilation techniques for sparse matrix computations

ICS '93 Proceedings of the 7th international conference on Supercomputing
Advanced compiler optimizations for sparse computations

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler techniques for maximizing fine-grain and coarse-grain parallelism in loops with uniform dependences

ICS '94 Proceedings of the 8th international conference on Supercomputing
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
LCM: memory system support for parallel language implementation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Efficient support for irregular applications on distributed-memory machines

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Schedulers as abstract interpretations of higher-dimensional automata

PEPM '95 Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications

IEEE Transactions on Parallel and Distributed Systems
Automatic Data Structure Selection and Transformation for Sparse Matrix Computations

IEEE Transactions on Parallel and Distributed Systems
Performance debugging shared memory parallel programs using run-time dependence analysis

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Predicated array data-flow analysis for run-time parallelization

ICS '98 Proceedings of the 12th international conference on Supercomputing
Tempest and typhoon: user-level shared memory

25 years of the international symposia on Computer architecture (selected papers)
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Evaluation of predicated array data-flow analysis for automatic parallelization

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The doconsider loop

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Comparative study of page-based and segment-based software DSM through compiler optimization

Proceedings of the 14th international conference on Supercomputing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
Compiler analysis of irregular memory accesses

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Multigrain shared memory

ACM Transactions on Computer Systems (TOCS)
A framework for remote dynamic program optimization

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

IEEE Transactions on Parallel and Distributed Systems
High-level adaptive program optimization with ADAPT

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Hybrid analysis: static & dynamic memory reference analysis

ICS '02 Proceedings of the 16th international conference on Supercomputing
An efficient algorithm for the run-time parallelization of DOACROSS loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Time-Stamping Algorithms for Parallelization of Loops at Run-Time

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Principles of Speculative Run-Time Parallelization

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Improving Locality in the Parallelization of Doacross Loops (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Techniques for Reducing the Overhead of Run-Time Parallelization

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Efficient Interprocedural Data Placement Optimisation in a Parallel Library

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A Case for Combining Compile-Time and Run-Time Parallelization

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Toward efficient and robust software speculative parallelization on multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Constructing parallel implementations with algebraic programming tools

PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
ADAPT: Automated De-Coupled Adaptive Program Transformation

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
A dynamic application-driven data communication strategy

Proceedings of the 18th annual international conference on Supercomputing
Hybrid analysis: static & dynamic memory reference analysis

International Journal of Parallel Programming
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
Design Space Exploration of a Software Speculative Parallelization Scheme

IEEE Transactions on Parallel and Distributed Systems
An Adaptive Algorithm Selection Framework for Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Combining compile-time and run-time parallelization[1]

Scientific Programming
Executing irregular scientific applications on stream architectures

Proceedings of the 21st annual international conference on Supercomputing
Sensitivity analysis for automatic parallelization on multi-cores

Proceedings of the 21st annual international conference on Supercomputing
Modeling optimistic concurrency using quantitative dependence analysis

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Object-Oriented Support for Adaptive Methods on Paranel Machines

Scientific Programming - The First Annual Object-Oriented Numerics Conference (OON-SKI '93)
System-scenario-based design of dynamic embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Implementation of Sensitivity Analysis for Automatic Parallelization

Languages and Compilers for Parallel Computing
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Multicore diversity: a software developer's nightmare

ACM SIGOPS Operating Systems Review
A compile/run-time environment for the automatic transformation of linked list data structures

International Journal of Parallel Programming
Predecessor/successor approach for high-performance run-time wavefront scheduling

Information Sciences: an International Journal
Compiler and middleware support for scalable data mining

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
An adaptive scheme for dynamic parallelization

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Exposing parallelism and locality in a runtime parallel optimization framework

Proceedings of the 7th ACM international conference on Computing frontiers
How to unleash array optimizations on code using recursive data structures

Proceedings of the 24th ACM International Conference on Supercomputing
OoOJava: an out-of-order approach to parallel programming

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Transparent runtime parallelization of the R scripting language

Journal of Parallel and Distributed Computing
OoOJava: software out-of-order execution

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Sublimation: expanding data structures to enable data instance specific optimizations

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Automatic CPU-GPU communication management and optimization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
NDSeq: runtime checking for nondeterministic sequential specifications of parallel correctness

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
An automatic parallelization framework for algebraic computation systems

Proceedings of the 36th international symposium on Symbolic and algebraic computation
Probabilistic program analysis for parallelizing compilers

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Compiler and runtime support for shared memory parallelization of data mining algorithms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
An evaluation of auto-scoping in OpenMP

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
DOJ: dynamically parallelizing object-oriented programs

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Automatic restructuring of linked data structures

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Parcae: a system for flexible parallel execution

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Dynamically managed data for CPU-GPU architectures

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Automatic communication coalescing for irregular computations in UPC language

CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Improving communication in PGAS environments: static and dynamic coalescing in UPC

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	14.98

Visualization

Abstract

The authors study run-time methods to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. The authors utilize symbolic transformation rules to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. The authors present performance results from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.