MPI datatype processing using runtime compilation

Authors:
Timo Schneider;Fredrik Kjolstad;Torsten Hoefler
Affiliations:
ETH Zurich, Zurich, Switzerland;MIT CSAIL, Cambridge, MA;ETH Zurich, Zurich, Switzerland
Venue:
Proceedings of the 20th European MPI Users' Group Meeting
Year:
2013

Citing 11
Cited 0

Stencils and problem partitionings: their influence on the performance of multiple processor systems

IEEE Transactions on Computers
Flattening on the Fly: Efficient Handling of MPI Derived Datatypes

Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A time-split nonhydrostatic atmospheric model for weather research and forecasting applications

Journal of Computational Physics
High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62K processors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallel zero-copy algorithms for fast Fourier transform and conjugate gradient using MPI datatypes

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Using MPI derived datatypes in numerical libraries

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Performance expectations and guidelines for MPI derived datatypes

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Automatic datatype generation and optimization

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Micro-applications for communication data access patterns and MPI datatypes

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

CLUSTER '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data packing before and after communication can make up as much as 90% of the communication time on modern computers. Despite MPI's well-defined datatype interface for non-contiguous data access, many codes use manual pack loops for performance reasons. Programmers write access-pattern specific pack loops (e.g., do manual unrolling) for which compilers emit optimized code. In contrast, MPI implementations in use today interpret datatypes at pack time, resulting in high overheads. In this work we explore the effectiveness of using runtime compilation techniques to generate efficient and optimized pack code for MPI datatypes at commit time. Thus, none of the overhead of datatype interpretation is incurred at pack time and pack setup is as fast as calling a function pointer. We have implemented a library called libpack that can be used to compile and (un)pack MPI datatypes. The library optimizes the datatype representation and uses the LLVM framework to produce vectorized machine code for each datatype at commit time. We show several examples of how MPI datatype pack functions benefit from runtime compilation and analyze the performance of compiled pack functions for the data access patterns in many applications. We show that the pack/unpack functions generated by our packing library are seven times faster than those of prevalent MPI implementations for 73% of the datatypes used in a scientific application and in many cases outperform manual pack loops.