Automatic decomposition of scientific programs for parallel execution

Authors:
r. Allen;D. Callahan;K. Kennedy
Affiliations:
Department of Computer Science, Rice University, Houston, Texas;Department of Computer Science, Rice University, Houston, Texas;Department of Computer Science, Rice University, Houston, Texas
Venue:
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Year:
1987

Citing 8
Cited 54

The cosmic cube

Communications of the ACM - Special section on computer architecture
Interprocedural constant propagation

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
A global approach to detection of parallelism

A global approach to detection of parallelism
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Structure of Computers and Computations

Structure of Computers and Computations
Dependence analysis for subscripted variables and its application to program transformations

Dependence analysis for subscripted variables and its application to program transformations
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)

Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
A Parallel Programming Environment

IEEE Software

Compiler algorithms for synchronization

IEEE Transactions on Computers
Impact of self-scheduling order on performance on multiprocessor systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Experiences with poker

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Automatic discovery of parallelism: a tool and an experiment (extended abstract)

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Static analysis of low-level synchronization

PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Run-time parallelization and scheduling of loops

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
The parascope editor: an interactive parallel programming tool

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
An interval-based approach to exhaustive and incremental interprocedural data-flow analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Program optimization and parallelization using idioms

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Experiences with data dependence abstractions

ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop displacement: an approach for transforming and scheduling loops for parallel execution

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Generating explicit communication from shared-memory program references

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
Interprocedural transformations for parallel code generation

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Vector Register Allocation

IEEE Transactions on Computers
Program optimization and parallelization using idioms

ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
HeNCE: a heterogenous network computing environment

Scientific Programming
On Effective Execution of Nonuniform DOACROSS Loops

IEEE Transactions on Parallel and Distributed Systems
Compiler techniques for data synchronization in nested parallel loops

ICS '90 Proceedings of the 4th international conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
The doconsider loop

ICS '89 Proceedings of the 3rd international conference on Supercomputing
The impact of synchronization and granularity on parallel systems

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Fast greedy weighted fusion

Proceedings of the 14th international conference on Supercomputing
Loop Transformations for Architectures with Partitioned Register Banks

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Fast Greedy Weighted Fusion

International Journal of Parallel Programming
Compile Time Barrier Synchronization Minimization

IEEE Transactions on Parallel and Distributed Systems
Improving the performance of DSM systems via compiler involvement

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Interactive Parallel Programming using the ParaScope Editor

IEEE Transactions on Parallel and Distributed Systems
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Complexity of Multi-dimensional Loop Alignment

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Temporary Arrays for Distribution of Loops with Control Dependences

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
On the Optimality of Feautrier's Scheduling Algorithm

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
SAGE: A New Analysis and Optimization System for FlexRAM Architecture

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
References

Sourcebook of parallel computing
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
SAGE: an automatic analyzing system for a new high-performance SoC architecture-processor-in-memory

Journal of Systems Architecture: the EUROMICRO Journal
Improving register allocation for subscripted variables

ACM SIGPLAN Notices - Best of PLDI 1979-1999
The Energy Impact of Aggressive Loop Fusion

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Distributed Implementation of OpenMP Based on Checkpointing Aided Parallel Execution

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Dependence-based code generation for a CELL processor

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Mapping normalization technique on the HPF compiler fhpf

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
COSPIM: a program optimization system for tightly-coupled heterogeneous environments

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Generalized index-set splitting

CC'05 Proceedings of the 14th international conference on Compiler Construction
Designing programming languages for the analyzability of pointer data structures

Computer Languages
Efficient parallel implementation of sequence analysis algorithms using a global address space model

Mathematical and Computer Modelling: An International Journal
Toward a distributed implementation of openMP using CAPE

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Partitioning applications for hybrid and federated clouds

CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Checkpointing aided parallel execution model and analysis

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.01

Visualization

Abstract

An algorithm for transforming sequential programs into equivalent parallel programs is presented. The method concentrates on finding loops whose separate iterations can be run in parallel without synchronization. Although a simple version of the method can be shown to be optimal, the problem of generating optimal code when loop interchange is employed is shown to be intractable. These methods are implemented in an experimental translation system developed at Rice University.