Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors

Authors:
Affiliations:
Venue:
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Year:
1998

Citing 10
Cited 11

Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The high performance Fortran handbook

The high performance Fortran handbook
Parallelizing complex scans and reductions

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Flattening and parallelizing irregular, recurrent loop nests

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Detection and global optimization of reduction operations for distributed parallel machines

ICS '96 Proceedings of the 10th international conference on Supercomputing
MPI: The Complete Reference

MPI: The Complete Reference
Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed-Memory Multiprocessors

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing

Using integer sets for data-parallel program analysis and optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
High-level Language Support for User-defined Reductions

The Journal of Supercomputing
Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Compilation and Runtime-Optimizations for Software Distributed Shared Memory

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A Comparison of Locality Transformations for Irregular Codes

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Towards automatic parallelization of tree reductions in dynamic programming

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
An extended OpenMP targeting on the hybrid architecture of SMP-cluster

ACST'06 Proceedings of the 2nd IASTED international conference on Advances in computer science and technology
A translation system for enabling data mining applications on GPUs

Proceedings of the 23rd international conference on Supercomputing
Compiler and middleware support for scalable data mining

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Logical inference techniques for loop parallelization

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Financial software on GPUs: between Haskell and Fortran

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents reduction recognition and parallel code generation strategies for distributed-memory multiprocessors. We describe techniques to recognize a broad range of implicit reduction operations, including those involving statements at multiple loop nesting levels and intermixed with conditional control flow. We introduce two new optimizations: factoring which increases data locality for SUM and PRODUCT reductions, and index encoding which enables a single global communication to accomplish both an extreme value reduction and an extreme value location reduction. We have implemented these techniques in the dHPF compiler for High Performance Fortran (HPF). We evaluate their effectiveness experimentally by compiling several reduction benchmarks with dHPF and two commercial HPF compilers, and comparing the performance of the generated code on an IBM SP2. Our results show that our recognition techniques are more powerful and that our index encoding and factoring optimizations can improve performance by a factor of two where they apply.