MPI-aware compiler optimizations for improving communication-computation overlap

Authors:
Anthony Danalis;Lori Pollock;Martin Swany;John Cavazos
Affiliations:
University of Delaware, Newark, DE, USA;University of Delaware, Newark, DE, USA;University of Delaware, Newark, DE, USA;University of Delaware, Newark, DE, USA
Venue:
Proceedings of the 23rd international conference on Supercomputing
Year:
2009

Citing 20
Cited 9

Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines

Multiprocessor performance measurement and evaluation
An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Advanced compiler design and implementation

Advanced compiler design and implementation
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
A Compilation Approach for Fortran 90D/ HPF Compilers

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A New Approach to Array Redistribution: Strip Mining Redistribution

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Titanium Language Reference Manual

Titanium Language Reference Manual
GASNet Specification, v1.1

GASNet Specification, v1.1
Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Transformations to Parallel Codes for Communication-Computation Overlap

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Representation-independent program analysis

PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Data-Flow Analysis for MPI Programs

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Automatic nonblocking communication for partitioned global address space programs

Proceedings of the 21st annual international conference on Supercomputing
Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Leveraging non-blocking collective communication in high-performance applications

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Optimizing bandwidth limited problems using one-sided communication and overlap

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Precise dynamic analysis for slack elasticity: adding buffering without adding bugs

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Towards autotuning by alternating communication methods

Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
Delta Send-Recv for Dynamic Pipelining in MPI Programs

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Multiparty session c: safe parallel programming with message optimisation

TOOLS'12 Proceedings of the 50th international conference on Objects, Models, Components, Patterns
Towards autotuning by alternating communication methods

ACM SIGMETRICS Performance Evaluation Review
MPI and compiler technology: a love-hate relationship

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Exact dependence analysis for increased communication overlap

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Globalizing selectively: shared-memory efficiency with address-space separation

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Heterogeneous-race-free memory models

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as a black box with unknown side effects and thus miss potential optimizations. This paper's contributions enable the development of an MPI-aware optimizing compiler that can perform transformations exploiting knowledge of MPI call effects to increase communication-computa-tion overlap. We formulate a set of data flow equations and rules to describe the side effects of key MPI functions so an MPI-aware compiler can automatically assess the safety of transformations. After categorizing existing compiler transformations based on their effect on the application code, we present an optimization algorithm that specifies when and how to apply these optimizing transformations to achieve improved communication-computation overlap. By manually applying the optimization algorithm to kernels extracted from HYCOM and the NAS benchmarks, we show that even when transforming these highly optimized codes, execution time can be decreased by an average of over 30%.