Measuring execution times of collective communications in an empirical optimization framework

Authors:
Katharina Benkert;Edgar Gabriel
Affiliations:
High Performance Computing Center Stuttgart, University of Stuttgart, Stuttgart, Germany;Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston, Houston, TX
Venue:
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Year:
2010

Citing 4
Cited 0

Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Minimizing development and maintenance costs in supporting persistently optimized BLAS

Software—Practice & Experience - Research Articles
STAR-MPI: self tuned adaptive routines for MPI collective operations

Proceedings of the 20th annual international conference on Supercomputing
Towards performance portability through runtime adaptation for high-performance computing applications

Concurrency and Computation: Practice & Experience - International Supercomputing Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

An essential part of an empirical optimization library are the timing procedures with which the performance of different codelets is determined. In this paper, we present for four different timing methods to optimize collective MPI communications and compare their accuracy for the FFT NAS Parallel Benchmarks on a variety of systems with different MPI implementations. We find that timing larger code portions with infrequent synchronizations performs well on all systems.