Automatic memory optimizations for improving MPI derived datatype performance

Authors:
Surendra Byna;Xian-He Sun;Rajeev Thakur;William Gropp
Affiliations:
Department of Computer Science, Illinois Institute of Technology, Chicago, IL;Department of Computer Science, Illinois Institute of Technology, Chicago, IL;Math. and Computer Science Division, Argonne National Laboratory, Argonne, IL;Math. and Computer Science Division, Argonne National Laboratory, Argonne, IL
Venue:
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Year:
2006

Citing 7
Cited 5

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
OMPI: optimizing MPI programs using partial evaluation

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Flattening on the Fly: Efficient Handling of MPI Derived Datatypes

Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Benchmark for MPI Derived Datatypes

Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Predicting memory-access cost based on data-access patterns

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4

Constructing MPI Input-output Datatypes for Efficient Transpacking

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Exploiting Efficient Transpacking for One-Sided Communication and MPI-IO

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Using MPI derived datatypes in numerical libraries

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Performance expectations and guidelines for MPI derived datatypes

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
MPI 3 and beyond: why MPI is successful and what challenges it faces

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

MPI derived datatypes allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. This powerful feature enables an MPI implementation to optimize the transfer of noncontiguous data. In practice, however, many implementations of MPI derived datatypes perform poorly, which makes application developers avoid using this feature. In this paper, we present a technique to automatically select templates that are optimized for memory performance based on the access pattern of derived datatypes. We implement this mechanism in the MPICH2 source code. The performance of our implementation is compared to well-written manual packing/unpacking routines and original MPICH2 implementation. We show that performance for various derived datatypes is significantly improved and comparable to that of optimized manual routines.