The Future Fast Fourier Transform?
SIAM Journal on Scientific Computing
A high performance parallel algorithm for 1-D FFT
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A general purpose sparse matrix parallel solvers package
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The Fastest Fourier Transform in the West
The Fastest Fourier Transform in the West
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
The development and integration of a distributed 3D FFT for a cluster of workstations
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Early performance data on the blue matter molecular simulation framework
IBM Journal of Research and Development
Blue matter on blue gene/L: massively parallel computation for biomolecular simulation
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Scalable algorithms for molecular dynamics simulations on commodity clusters
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A study of the effects of machine geometry and mapping on distributed transpose performance
Proceedings of the 5th conference on Computing frontiers
A massively parallel implementation of the common azimuth pre-stack depth migration
IBM Journal of Research and Development
Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Early performance data on the blue matter molecular simulation framework
IBM Journal of Research and Development
Drug design issues on the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Communication analysis of parallel 3D FFT for flat cartesian meshes on large Blue Gene systems
HiPC'08 Proceedings of the 15th international conference on High performance computing
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Toward performance models of MPI implementations for understanding application scaling issues
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Parallel implementation of the replica exchange molecular dynamics algorithm on blue gene/L
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
FPGA Architecture for 2D Discrete Fourier Transform Based on 2D Decomposition for Large-sized Data
Journal of Signal Processing Systems
On non-blocking collectives in 3D FFTs
Proceedings of the second workshop on Scalable algorithms for large-scale systems
Performance measurements of the 3D FFT on the blue gene/l supercomputer
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Automatic performance optimization of the discrete fourier transform on distributed memory computers
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
On the communication complexity of 3D FFTs and its implications for Exascale
Proceedings of the 26th ACM international conference on Supercomputing
Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Designing and auto-tuning parallel 3-D FFT for computation-communication overlap
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
This paper presents results on a communications-intensive kernel, the three-dimensional fast Fourier transform (3D FFT), running on the 2,048-node Blue Gene®/L (BG/L) prototype. Two implementations of the volumetric FFT algorithm were characterized, one built on the Message Passing Interface library and another built on an active packet Application Program Interface supported by the hardware bring-up environment, the BG/L advanced diagnostics environment. Preliminary performance experiments on the BG/L prototype indicate that both of our implementations scale well up to 1,024 nodes for 3D FFTs of size 128 × 128 × 128. The performance of the volumetric FFT is also compared with that of the Fastest Fourier Transform in the West (FFTW) library. In general, the volumetric FFT outperforms a port of the FFTW Version 2.1.5 library on large-node-count partitions.