Towards autotuning by alternating communication methods

Authors:
Adrian Tineo;Sadaf R. Alam;Thomas C. Schulthess
Affiliations:
Swiss National Supercomputing Centre (CSCS), Manno, Switzerland;Swiss National Supercomputing Centre (CSCS), Manno, Switzerland;Swiss National Supercomputing Centre (CSCS), Manno, Switzerland
Venue:
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
Year:
2011

Citing 4
Cited 0

Automatic nonblocking communication for partitioned global address space programs

Proceedings of the 21st annual international conference on Supercomputing
Optimizing a conjugate gradient solver with non-blocking collective operations

Parallel Computing
MPI-aware compiler optimizations for improving communication-computation overlap

Proceedings of the 23rd international conference on Supercomputing
Towards autotuning by alternating communication methods

ACM SIGMETRICS Performance Evaluation Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interconnects in emerging high performance computing systems feature hardware support for one-sided, asynchronous communication and global address space programming models in order to improve parallel efficiency and productivity by allowing communication and computation overlap and out-of-order delivery. In practice though, complex interactions between the software stack and the communication hardware make it challenging to obtain optimum performance for a full application expressed with a one-sided programming paradigm. Here, we present a proof-of-concept study for an autotuning framework that instantiates hybrid kernels based on refactored codes using available communication libraries or languages on a Cray XE6 and a SGI Altix UV 1000. We validate our approach by improving performance for bandwidth- and latency-bound kernels of interest in quantum physics and astrophysics by up to 35% and 80% respectively.