Effective communication coalescing for data-parallel applications

Authors:
Daniel Chavarría-Miranda;John Mellor-Crummey
Affiliations:
Pacific Northwest National Laboratory (PNNL);Rice University
Venue:
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2005

Citing 14
Cited 9

Global value numbers and redundant computations

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic array alignment in data-parallel programs

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The Omega Library interface guide

The Omega Library interface guide
An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A Unified Framework for Optimizing Communication in Data-Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Using integer sets for data-parallel program analysis and optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
A global communication optimization technique based on data-flow analysis and linear algebra

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling for Distributed Memory Architectures

IEEE Transactions on Parallel and Distributed Systems
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Automatic Vectorization of Communications for Data-Parallel Programs

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
A Multi-Platform Co-Array Fortran Compiler

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques

Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Performance portable optimizations for loops containing communication operations

Proceedings of the 22nd annual international conference on Supercomputing
A heuristic rule of partitioning irregular loop for parallelizing compilers

HPCA'09 Proceedings of the Second international conference on High Performance Computing and Applications
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Automatic communication coalescing for irregular computations in UPC language

CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Improving communication in PGAS environments: static and dynamic coalescing in UPC

Proceedings of the 27th international ACM conference on International conference on supercomputing
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA

Proceedings of the Conference on Design, Automation and Test in Europe
Generating efficient data movement code for heterogeneous architectures with distributed-memory

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Automatic data allocation and buffer management for multi-GPU machines

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Communication coalescing is a static optimization that can reduce both communication frequency and redundant data transfer in compiler-generated code for regular, data parallel applications. We present an algorithm for coalescing communication that arises when generating code for regular, data-parallel applications written in High Performance Fortran (HPF). To handle sophisticated computation partitionings, our algorithm normalizes communication before attempting coalescing. We experimentally evaluate our algorithm, which is implemented in the dHPF compiler, in the compilation of HPF versions of the NAS application benchmarks SP, BT and LU. Our normalized coalescing algorithm improves the performance and scalability of compiler-generated code for these benchmarks by reducing the communication volume up to 55% compared to a simpler coalescing strategy and enables us to match the communication volume and frequency in hand-optimized MPI implementations of these codes.