Vectorization techniques for the Blue Gene/L double FPU

  • Authors:
  • J. Lorenz;S. Kral;F. Franchetti;C. W. Ueberhuber

  • Affiliations:
  • Institute for Analysis and Scientific Computing, Vienna University of Technology, Vienna, Austria;Institute for Analysis and Scientific Computing, Vienna University of Technology, Vienna, Austria;Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, Pennsylvania;Institute for Analysis and Scientific Computing, Vienna University of Technology, Vienna, Austria

  • Venue:
  • IBM Journal of Research and Development
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents vectorization techniques tailored to meet the specifics of the two-way single-instruction multiple-data (SIMD) double-precision floating-point unit (FPU), which is a core element of the node application-specific integrated circuit (ASIC) chips of the IBM 360-teraflops Blue Gene®/L supercomputer. This paper focuses on the general-purpose basic-block vectorization and optimization methods as they are incorporated in the Vienna MAP vectorizer and optimizer. The innovative technologies presented here, which have consistently delivered superior performance and portability across a wide range of platforms, were carried over to prototypes of Blue Gene/L and joined with the automatic performance-tuning system known as Fastest Fourier Transform in the West (FFTW). FFTW performance-optimization facilities working with the compiler technologies presented in this paper are able to produce vectorized fast Fourier transform (FFT) codes that are tuned automatically to single Blue Gene/L processors and are up to 80% faster than the best-performing scalar FFT codes generated by FFTW.