PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Flattening on the Fly: Efficient Handling of MPI Derived Datatypes
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Benchmark for MPI Derived Datatypes
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Applying MPI Derived Datatypes to the NAS Benchmarks: A Case Study
ICPPW '04 Proceedings of the 2004 International Conference on Parallel Processing Workshops
A time-split nonhydrostatic atmospheric model for weather research and forecasting applications
Journal of Computational Physics
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallel Computing
Parallel zero-copy algorithms for fast Fourier transform and conjugate gradient using MPI datatypes
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
The Gemini System Interconnect
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Performance expectations and guidelines for MPI derived datatypes
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Poster: mini-applications: vehicles for co-design
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
MPI datatype processing using runtime compilation
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
Data is often communicated from different locations in application memory and is commonly serialized (copied) to send buffers or from receive buffers. MPI datatypes are a way to avoid such intermediate copies and optimize communications, however, it is often unclear which implementation and optimization choices are most useful in practice. We extracted the send/recv-buffer access pattern of a representative set of scientific applications into micro-applications that isolate their data access patterns. We also observed that the buffer-access patterns in applications can be categorized into three different groups. Our micro-applications show that up to 90% of the total communication time can be spent with local serialization and we found significant performance discrepancies between state-of-the-art MPI implementations. Our micro-applications aim to provide a standard benchmark for MPI datatype implementations to guide optimizations similarly to SPEC CPU and the Livermore loops do for compiler optimizations.