The parallel execution of DO loops

Authors:
Leslie Lamport
Affiliations:
Massachusetts Computer Associates, Inc., Wakefield
Venue:
Communications of the ACM
Year:
1974

Citing 2
Cited 152

A Theorem on Boolean Matrices

Journal of the ACM (JACM)
The Organization of Computations for Uniform Recurrence Equations

Journal of the ACM (JACM)

Systolic algorithms to examine all pairs of elements

Communications of the ACM
A Parallel Algorithm to Compute the Shortest Paths and Diameter of a Graph and Its VLSI Implementation

IEEE Transactions on Computers
Parallel scheduling of recursively defined arrays

Journal of Symbolic Computation
Compiling C for vectorization, parallelization, and inline expansion

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
The importance of direct dependences for automatic parallelization

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Automatic discovery of parallelism: a tool and an experiment (extended abstract)

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
The symbolic hyperplane transformation for recursively defined arrays

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
On high-speed computing with a programmable linear array

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A Note on the Linear Transformation Method for Systolic Array Design

IEEE Transactions on Computers
Vectorization and parallelization of irregular problems via graph coloring

ICS '91 Proceedings of the 5th international conference on Supercomputing
Loop partitioning for distributed memory multiprocessors as unimodular transformations

ICS '91 Proceedings of the 5th international conference on Supercomputing
Semantical interprocedural parallelization: an overview of the PIPS project

ICS '91 Proceedings of the 5th international conference on Supercomputing
Extending the I test to direction vectors

ICS '91 Proceedings of the 5th international conference on Supercomputing
Uniform techniques for loop optimization

ICS '91 Proceedings of the 5th international conference on Supercomputing
Analysis and transformation in the ParaScope editor

ICS '91 Proceedings of the 5th international conference on Supercomputing
A unified framework for systematic loop transformations

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Automatic partitioning of a program dependence graph into parallel tasks

IBM Journal of Research and Development
Analysis of free schedule in periodic graphs

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Delinearization: an efficient way to break multiloop dependence equations

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A general framework for iteration-reordering loop transformations

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial time

ICS '92 Proceedings of the 6th international conference on Supercomputing
On exact data dependence analysis

ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Access normalization: loop restructuring for NUMA compilers

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Non-unimodular transformations of nested loops

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Access normalization: loop restructuring for NUMA computers

ACM Transactions on Computer Systems (TOCS)
Program optimization and parallelization using idioms

ACM Transactions on Programming Languages and Systems (TOPLAS)
Reducing data communication overhead for DOACROSS loop nests

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler techniques for maximizing fine-grain and coarse-grain parallelism in loops with uniform dependences

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Compiler technology for parallel scientific computation

Scientific Programming
Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications

IEEE Transactions on Parallel and Distributed Systems
On Effective Execution of Nonuniform DOACROSS Loops

IEEE Transactions on Parallel and Distributed Systems
Achieving Full Parallelism Using Multidimensional Retiming

IEEE Transactions on Parallel and Distributed Systems
On the perfect accuracy of an approximate subscript analysis test

ICS '90 Proceedings of the 4th international conference on Supercomputing
Finding Space-Time Transformations for Uniform Recurrences viaBranching Parametric Linear Programming

Journal of VLSI Signal Processing Systems
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

Journal of VLSI Signal Processing Systems
On Time Optimal Implementation of Uniform Recurrences onto Array Processors via Quadratic Programming

Journal of VLSI Signal Processing Systems
Constraint based vectorization

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Alpha du centaur: a prototype environment for the design of parallel regular alorithms

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Program Improvement by Source-to-Source Transformation

Journal of the ACM (JACM)
Vectorization and parallelization interactive assistant

CSC '88 Proceedings of the 1988 ACM sixteenth annual conference on Computer science
An Approach to Checking Link Conflicts in the Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Computers
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays

Journal of VLSI Signal Processing Systems
Automatic Mapping of System of N-Dimensional Affine Recurrence Equations (SARE) onto Distributed Memory Parallel Systems

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Polygon rendering on a stream architecture

HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Finding Quadratic Schedules for Affine Recurrence Equations Via Nonsmooth Optimization

Journal of VLSI Signal Processing Systems
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
A preprocessing step for global loop transformations for data transfer optimization

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
A Survey of Parallel Machine Organization and Programming

ACM Computing Surveys (CSUR)
Ultracomputers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Glypnir—a programming language for Illiac IV

Communications of the ACM
Loop parallelization algorithms

Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops

Compiler optimizations for scalable parallel systems
Scheduling reductions on realistic machines

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Enabling unimodular transformations

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays

IEEE Transactions on Parallel and Distributed Systems
Synthesizing Nested Loop Algorithms Using Nonlinear Transformation Method

IEEE Transactions on Parallel and Distributed Systems
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
Dependence Uniformization: A Loop Parallelization Technique

IEEE Transactions on Parallel and Distributed Systems
The Direction Vector I Test

IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking

IEEE Transactions on Parallel and Distributed Systems
Constructive Methods for Scheduling Uniform Loop Nests

IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays

IEEE Transactions on Parallel and Distributed Systems
High Level Software Synthesis of Affine Iterative Algorithms onto Parallel Architectures

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
A BSP Approach to the Scheduling of Tightly-Nested Loops

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Structured Scheduling of Recurrence Equations: Theory and Practice

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Complexity of Multi-dimensional Loop Alignment

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
On the Optimality of Feautrier's Scheduling Algorithm

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Efficient Execution of Doacross Loops on Distributed Memory Systems

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Loop Transformations for Hierarchical Parallelism and Locality

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Structured scheduling of recurrence equations: theory and practice

Embedded processor design challenges
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework

IEEE Transactions on Parallel and Distributed Systems
An introduction to processor-time-optimal systolic arrays

Highly parallel computaions
Scheduling in Co-Partitioned Array Architectures

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Efficient mapping of algorithms to single-stage interconnections

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Using an oracle to measure potential parallelism in single instruction stream programs

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Data broadcasting in linearly scheduled array processors

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
The ILLIAC IV FORTRAN compiler

Proceedings of the conference on Programming languages and compilers for parallel and vector machines
The Paralyzer: Ivtran's Parallelism Analyzer and Synthesizer

Proceedings of the conference on Programming languages and compilers for parallel and vector machines
On programming parallel computers

Proceedings of the conference on Programming languages and compilers for parallel and vector machines
A CAD system for unified hardware-software design

DAC '75 Proceedings of the 12th Design Automation Conference
Multiprocessor software design

ACM '80 Proceedings of the ACM 1980 annual conference
Program improvement by source to source transformation

POPL '76 Proceedings of the 3rd ACM SIGACT-SIGPLAN symposium on Principles on programming languages
On the parallelization of loop nests containing while loops

PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Users' experience with the ILLIAC IV system and its programming languages

ACM SIGPLAN Notices
Configware and morphware going mainstream

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable systems
An experimental evaluation of scalar replacement on scientific benchmarks

Software—Practice & Experience
The digital divide of computing

Proceedings of the 1st conference on Computing frontiers
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Automatic loop interchange

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Improving register allocation for subscripted variables

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Mapping rectangular mesh algorithms onto asymptotically space-optimal arrays

Journal of Parallel and Distributed Computing
Optimizing the memory bandwidth with loop fusion

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A two-level scheduling method: an effective parallelizing technique for uniform nested loops on a DSP multiprocessor

Journal of Systems and Software - Special issue: Software engineering education and training
Scalarization using loop alignment and loop skewing

The Journal of Supercomputing
Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications

International Journal of High Performance Computing Applications
Iterational retiming: maximize iteration-level parallelism for nested loops

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Reuse analysis of indirectly indexed arrays

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Examining DCSP coordination tradeoffs

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

The Journal of Supercomputing
Violated dependence analysis

Proceedings of the 20th annual international conference on Supercomputing
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
An effective and efficient code generation algorithm for uniform loops on non-orthogonal DSP architecture

Journal of Systems and Software
The rise and fall of High Performance Fortran: an historical object lesson

Proceedings of the third ACM SIGPLAN conference on History of programming languages
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping

Journal of VLSI Signal Processing Systems
An Efficient Code Generation Algorithm for Non-orthogonal DSP Architecture

Journal of VLSI Signal Processing Systems
Implementing fine grain processor arrays on field-programmable logic

Integrated Computer-Aided Engineering
MPSoC memory optimization using program transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient implementation of nested-loop multimedia algorithms

EURASIP Journal on Applied Signal Processing
A Restructurable Computer System

IEEE Transactions on Computers
On the Analysis and Synthesis of VLSI Algorithms

IEEE Transactions on Computers
Measuring the Parallelism Available for Very Long Instruction Word Architectures

IEEE Transactions on Computers
Register allocation for software pipelined multidimensional loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cronus: A platform for parallel code generation based on computational geometry methods

Journal of Systems and Software
A reindexing based approach towards mapping of DAG with affine schedules onto parallel embedded systems

Journal of Parallel and Distributed Computing
Optimizing parallelism for nested loops with iterational and instructional retiming

Journal of Embedded Computing - Selected papers of EUC 2005
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
WARPP: a toolkit for simulating high-performance parallel scientific codes

Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automatic Parallelization and Optimization of Programs by Proof Rewriting

SAS '09 Proceedings of the 16th International Symposium on Static Analysis
On control signals for multi-dimensional time

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Pipelined parallelization in HPF programs on the earth simulator

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Speculative parallelization using state separation and multiple value prediction

Proceedings of the 2010 international symposium on Memory management
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Structured parallel programming with deterministic patterns

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Automatic verification of determinism for structured parallel programs

SAS'10 Proceedings of the 17th international conference on Static analysis
Geometric scheduling of 2-D UET-UCT uniform dependence loops

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Automatic code generation for distributed memory architectures in the polytope model

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Loop striping: maximize parallelism for nested loops

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
Optimizing nested loops with iterational and instructional retiming

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Cooperative parallelization

Proceedings of the International Conference on Computer-Aided Design
Automatic detection of saturation and clipping idioms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Synthesising graphics card programs from DSLs

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Design space exploration of deeply nested loop 2D filtering and 6 level FSBM algorithm mapped onto systolic array

VLSI Design
Self adaptive run time scheduling for the automatic parallelization of loops with the C2µTC/SL compiler

Parallel Computing
A direct method for optimal VLSI realization of deeply nested n-D loop problems

Microprocessors & Microsystems
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	48.28

Visualization

Abstract

Methods are developed for the parallel execution of different iterations of a DO loop. Both asynchronous multiprocessor computers and array computers are considered. Practical application to the design of compilers for such computers is discussed.