Automatically tuned FFTs for bluegene/l's double FPU

  • Authors:
  • Franz Franchetti;Stefan Kral;Juergen Lorenz;Markus Püschel;Christoph W. Ueberhuber

  • Affiliations:
  • Institute for Analysis and Scientific Computing, Vienna University of Technology, Wien, Austria;Institute for Analysis and Scientific Computing, Vienna University of Technology, Wien, Austria;Institute for Analysis and Scientific Computing, Vienna University of Technology, Wien, Austria;Dept. of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA;Institute for Analysis and Scientific Computing, Vienna University of Technology, Wien, Austria

  • Venue:
  • VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation is planned to be a 65,536 processors system featuring a peak performance of 360 Tflop/s. This system is supposed to lead the Top 500 list when being installed in 2005 at the Lawrence Livermore National Laboratory. This paper presents one of the first numerical kernels run on a prototype BlueGene/L machine. We tuned our formal vectorization approach as well as the Vienna MAP vectorizer to support BlueGene/L's custom two-way short vector SIMD “double” floating-point unit and connected the resulting methods to the automatic performance tuning systems Spiral and Fftw. Our approach produces automatically tuned high-performance FFT kernels for BlueGene/L that are up to 45% faster than the best scalar spiral generated code and up to 75% faster than Fftw when run on a single BlueGene/L processor.