Optimization of fast Fourier transforms on the Blue Gene/L supercomputer

  • Authors:
  • Yogish Sabharwal;Saurabh K. Garg;Rahul Garg;John A. Gunnels;Ramendra K. Sahoo

  • Affiliations:
  • IBM India Research Laboratory, New Delhi, India;Grid Comp. and Dist. Sys. Lab, Deptt of Comp. Sc. and Software Engg, The University of Melbourne, Australia;IBM India Research Laboratory, New Delhi, India;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • HiPC'08 Proceedings of the 15th international conference on High performance computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We analyze the bottlenecks in the parallel FFT algorithmand describe optimizations carried out for the algorithm on the BlueGene/L Supercomputer. We identified three avenues for improving theperformance of the algorithm - single-node FFT performance, Alltoall collectiveperformance and overlap of computation and communication. Performanceat all these levels has been optimized using the double-hummer intrinsics of the Blue Gene/L CPU, careful ordering and synchronizationof messages in Alltoall communications and suitable interleaving of messageexchangeswith computations. Using these optimizations, we obtained20% performance improvement over the baseline version on the 64 racksBlue Gene/L system. We give a brief overview of the Alltoall optimizations,describe our computation-communication overlap strategy and present resultsfor strong scaling and weak scaling of parallel FFT on Blue Gene/L.We also discuss the fundamental limits to scaling of the parallel transposealgorithm for computing FFT.