Performance measurements of the 3D FFT on the blue gene/l supercomputer

  • Authors:
  • Maria Eleftheriou;Blake Fitch;Aleksandr Rayshubskiy;T. J. Christopher Ward;Robert Germain

  • Affiliations:
  • IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Advanced Diagnostics Environment (BG/L ADE) [17]. We compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives) and to a port of version 2.1.5 the FFTW library [14]. Performance experiments on the Blue Gene/L prototype indicate that both of our implementations scale well and the current MPI-based implementation shows a speedup of 730 on 2048 nodes for 3D FFTs of size 128 × 128 × 128. Moreover, the volumetric FFT outperforms FFTW port by a factor 8 for a 128× 128× 128 complex FFT on 2048 nodes.