Designing and auto-tuning parallel 3-D FFT for computation-communication overlap

  • Authors:
  • Sukhyun Song;Jeffrey K. Hollingsworth

  • Affiliations:
  • University of Maryland, College Park, MD, USA;University of Maryland, College Park, MD, USA

  • Venue:
  • Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a method to design and auto-tune a new parallel 3-D FFT code using the non-blocking MPI all-to-all operation. We achieve high performance by optimizing computation-communication overlap. Our code performs fully asynchronous communication without any support from special hardware. We also improve cache performance through loop tiling. To cope with the complex trade-off regarding our optimization techniques, we parameterize our code and auto-tune the parameters efficiently in a large parameter space. Experimental results from two systems confirm that our code achieves a speedup of up to 1.76x over the FFTW library.