Global value numbers and redundant computations
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic array alignment in data-parallel programs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The Omega Library interface guide
The Omega Library interface guide
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A Unified Framework for Optimizing Communication in Data-Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
A global communication optimization technique based on data-flow analysis and linear algebra
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling for Distributed Memory Architectures
IEEE Transactions on Parallel and Distributed Systems
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Automatic Vectorization of Communications for Data-Parallel Programs
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
A Multi-Platform Co-Array Fortran Compiler
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Communication Optimizations for Fine-Grained UPC Applications
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Performance portable optimizations for loops containing communication operations
Proceedings of the 22nd annual international conference on Supercomputing
A heuristic rule of partitioning irregular loop for parallelizing compilers
HPCA'09 Proceedings of the Second international conference on High Performance Computing and Applications
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Automatic communication coalescing for irregular computations in UPC language
CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Improving communication in PGAS environments: static and dynamic coalescing in UPC
Proceedings of the 27th international ACM conference on International conference on supercomputing
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the Conference on Design, Automation and Test in Europe
Generating efficient data movement code for heterogeneous architectures with distributed-memory
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Automatic data allocation and buffer management for multi-GPU machines
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Communication coalescing is a static optimization that can reduce both communication frequency and redundant data transfer in compiler-generated code for regular, data parallel applications. We present an algorithm for coalescing communication that arises when generating code for regular, data-parallel applications written in High Performance Fortran (HPF). To handle sophisticated computation partitionings, our algorithm normalizes communication before attempting coalescing. We experimentally evaluate our algorithm, which is implemented in the dHPF compiler, in the compilation of HPF versions of the NAS application benchmarks SP, BT and LU. Our normalized coalescing algorithm improves the performance and scalability of compiler-generated code for these benchmarks by reducing the communication volume up to 55% compared to a simpler coalescing strategy and enables us to match the communication volume and frequency in hand-optimized MPI implementations of these codes.