The Future Fast Fourier Transform?
SIAM Journal on Scientific Computing
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The Fastest Fourier Transform in the West
The Fastest Fourier Transform in the West
Blue Matter, an application framework for molecular simulation on blue gene
Journal of Parallel and Distributed Computing - High-performance computational biology
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
The development and integration of a distributed 3D FFT for a cluster of workstations
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Blue Gene/L advanced diagnostics environment
IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
Vectorization techniques for the Blue Gene/L double FPU
IBM Journal of Research and Development
IBM Journal of Research and Development
Blue matter: approaching the limits of concurrency for classical molecular dynamics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic reuse of subroutine results
Journal of Systems Architecture: the EUROMICRO Journal
A study of the effects of machine geometry and mapping on distributed transpose performance
Proceedings of the 5th conference on Computing frontiers
Blue matter: scaling of N-body simulations to one atom per node
IBM Journal of Research and Development
Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system
IBM Journal of Research and Development
A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Progress in scaling biomolecular simulations to petaflop scale platforms
Euro-Par'06 Proceedings of the CoreGRID 2006, UNICORE Summit 2006, Petascale Computational Biology and Bioinformatics conference on Parallel processing
Parallel implementation of the replica exchange molecular dynamics algorithm on blue gene/L
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Blue matter: strong scaling of molecular dynamics on blue gene/l
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
High performance 3D convolution for protein docking on IBM blue gene
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Advanced Diagnostics Environment (BG/L ADE) [17]. We compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives) and to a port of version 2.1.5 the FFTW library [14]. Performance experiments on the Blue Gene/L prototype indicate that both of our implementations scale well and the current MPI-based implementation shows a speedup of 730 on 2048 nodes for 3D FFTs of size 128 × 128 × 128. Moreover, the volumetric FFT outperforms FFTW port by a factor 8 for a 128× 128× 128 complex FFT on 2048 nodes.