ICS '88 Proceedings of the 2nd international conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines
Multiprocessor performance measurement and evaluation
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Advanced compiler design and implementation
Advanced compiler design and implementation
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
A Compilation Approach for Fortran 90D/ HPF Compilers
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A New Approach to Array Redistribution: Strip Mining Redistribution
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Titanium Language Reference Manual
Titanium Language Reference Manual
GASNet Specification, v1.1
Communication Optimizations for Fine-Grained UPC Applications
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Representation-independent program analysis
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Data-Flow Analysis for MPI Programs
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Automatic nonblocking communication for partitioned global address space programs
Proceedings of the 21st annual international conference on Supercomputing
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Leveraging non-blocking collective communication in high-performance applications
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Precise dynamic analysis for slack elasticity: adding buffering without adding bugs
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Towards autotuning by alternating communication methods
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
Delta Send-Recv for Dynamic Pipelining in MPI Programs
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Multiparty session c: safe parallel programming with message optimisation
TOOLS'12 Proceedings of the 50th international conference on Objects, Models, Components, Patterns
Towards autotuning by alternating communication methods
ACM SIGMETRICS Performance Evaluation Review
MPI and compiler technology: a love-hate relationship
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Exact dependence analysis for increased communication overlap
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Globalizing selectively: shared-memory efficiency with address-space separation
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Heterogeneous-race-free memory models
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as a black box with unknown side effects and thus miss potential optimizations. This paper's contributions enable the development of an MPI-aware optimizing compiler that can perform transformations exploiting knowledge of MPI call effects to increase communication-computa-tion overlap. We formulate a set of data flow equations and rules to describe the side effects of key MPI functions so an MPI-aware compiler can automatically assess the safety of transformations. After categorizing existing compiler transformations based on their effect on the application code, we present an optimization algorithm that specifies when and how to apply these optimizing transformations to achieve improved communication-computation overlap. By manually applying the optimization algorithm to kernels extracted from HYCOM and the NAS benchmarks, we show that even when transforming these highly optimized codes, execution time can be decreased by an average of over 30%.