Communication optimization and code generation for distributed memory machines

Authors:
Saman P. Amarasinghe;Monica S. Lam
Affiliations:
-;-
Venue:
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Year:
1993

Citing 21
Cited 80

Theory of linear and integer programming

Theory of linear and integer programming
International Journal of Parallel Programming

International Journal of Parallel Programming
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
An architecture independent programming language for low-level vision

Computer Vision, Graphics, and Image Processing
Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compile-time generation of regular communications patterns

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Array-data flow analysis and its use in array privatization

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Accurate analysis of array references

Accurate analysis of array references
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
An optimizing Fortran D compiler for MIMD distributed-memory machines

An optimizing Fortran D compiler for MIMD distributed-memory machines
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs

IEEE Transactions on Parallel and Distributed Systems
Integrating Scalar Optimization and Parallelization

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing

Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Lazy array data-flow dependence analysis

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Twisted data layout

ICS '94 Proceedings of the 8th international conference on Supercomputing
Static analysis of upper and lower bounds on dependences and parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication optimizations for parallel computing using data access information

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines

ICS '95 Proceedings of the 9th international conference on Supercomputing
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers

ICS '95 Proceedings of the 9th international conference on Supercomputing
Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Detection and global optimization of reduction operations for distributed parallel machines

ICS '96 Proceedings of the 10th international conference on Supercomputing
Minimizing communication while preserving parallelism

ICS '96 Proceedings of the 10th international conference on Supercomputing
Static analysis to reduce synchronization costs in data-parallel programs

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Unified Framework for Optimizing Communication in Data-Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Tolerating latency in multiprocessors through compiler-inserted prefetching

ACM Transactions on Computer Systems (TOCS)
Using integer sets for data-parallel program analysis and optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Communication optimizations for parallel C programs

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Problem and machine sensitive communication optimization

ICS '98 Proceedings of the 12th international conference on Supercomputing
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

ACM Transactions on Computer Systems (TOCS)
A global communication optimization technique based on data-flow analysis and linear algebra

ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimizing Data and Synchronization Costs in One-Way Communication

IEEE Transactions on Parallel and Distributed Systems
A balanced code placement framework

ACM Transactions on Programming Languages and Systems (TOPLAS)
Global optimization techniques for automatic parallelization of hybrid applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Static Single Assignment Form for Message-Passing Programs

International Journal of Parallel Programming
Contention elimination by replication of sequential sections in distributed shared memory programs

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A framework for global communication analysis of optimizations

Compiler optimizations for scalable parallel systems
Advanced code generation for high performance Fortran

Compiler optimizations for scalable parallel systems
Supporting dynamic data structures with Olden

Compiler optimizations for scalable parallel systems
The Efficient Computation of Ownership Sets in HPF

IEEE Transactions on Parallel and Distributed Systems
Efficient implementation of the multigrid preconditioned conjugate gradient method on distributed memory machines

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Expressing cross-loop dependencies through hyperplane data dependence analysis

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Temporal Characterization of Demands for Data Movement on Parallel Programs

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
The Plan-Du Style Compilation Technique for Eager Data Transfer in Thread-Based Execution

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Toward Compiler Support for Scalable Parallelism Using Multipartitioning

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Accurate Data and Context Management in Message-Passing Programs

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Generating Realignment-Based Communication for HPF Programs

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Compiler-generated communication for pipelined FPGA applications

Proceedings of the 40th annual Design Automation Conference
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Template-based program restructuring - initial experience

CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
EPPP - an integrated environment for portable parallel programming

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Automatic decomposition in EPPP compiler

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Using cache optimizing compiler for managing software cache on distributed shared memory system

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
References

Sourcebook of parallel computing
Automatic parallel code generation for tiled nested loops

Proceedings of the 2004 ACM symposium on Applied computing
Linear data distribution based on index analysis

High performance scientific and engineering computing
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Compiler-directed proactive power management for networks

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
2D data locality: definition, abstraction, and application

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Instruction scheduling for a tiled dataflow architecture

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Message-passing code generation for non-rectangular tiling transformations

Parallel Computing
Parameterized tiled loops for free

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Communication optimizations for global multi-threaded instruction scheduling

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Multi-level tiling: M for the price of one

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Extracting synchronization-free threads in perfectly nested loops using the omega project software

SEPADS'05 Proceedings of the 4th WSEAS International Conference on Software Engineering, Parallel & Distributed Systems
Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Efficient, portable implementation of asynchronous multi-place programs

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Slicing based code parallelization for minimizing inter-processor communication

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Extracting synchronization-free slices of operations in perfectly-nested loops

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Finding synchronization-free parallelism for non-uniform loops

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Finding coarse grained parallelism in computational geometry algorithms

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartIII
Automatic parallelization of simulink applications

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Reducing task creation and termination overhead in explicitly parallel programs

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Data layout transformation for stencil computations on short-vector SIMD architectures

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Exploiting single-assignment properties to optimize message-passing programs by code transformations

IFL'04 Proceedings of the 16th international conference on Implementation and Application of Functional Languages
Efficient tiled loop generation: D-tiling

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Parameterized loop tiling

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic speculative DOALL for clusters

Proceedings of the Tenth International Symposium on Code Generation and Optimization
A Transformation Framework for Optimizing Task-Parallel Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Computational caches

Proceedings of the 6th International Systems and Storage Conference
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Generating efficient data movement code for heterogeneous architectures with distributed-memory

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
ASC: automatically scalable computation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Automatic data allocation and buffer management for multi-GPU machines

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our algorithms produce an SPMD (single program multiple data) program to be run on each processor. Our compiler generated the necessary receive and send instructions, optimizes the communication by eliminating redundant communication and aggregating small messages into large messages, allocates space locally on each processor, and translates global data addresses to local addresses.Our techniques are based on an exact data-flow analysis on individual array element accesses. Unlike data dependence analysis, this analysis determines if two dynamic instances refer to the same value, and not just to the same location. Using this information, our compiler can handle more flexible data decompositions and find more opportunities for communication optimization than systems based on data dependence analysis.Our technique is based on a uniform framework, where data decompositions, computation decompositions and the data flow information are all represented as systems of linear inequalities. We show that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.