FFTs in external or hierarchical memory
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
High-performance FFT algorithms for the Convex C4/XA supercomputer
The Journal of Supercomputing - Special issue: trends in parallel operating systems
The Fastest Fourier Transform in the West
The Fastest Fourier Transform in the West
High Performance FFT Algorithms for Cache-Coherent Multiprocessors
International Journal of High Performance Computing Applications
FFT algorithms for vector computers
Parallel Computing
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
An OpenMP implementation of parallel FFT and its performance on IA-64 processors
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
High performance 3D convolution for protein docking on IBM blue gene
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
In this paper, we propose a blocking algorithm for parallel one-dimensional fast Fourier transform (FFT) on shared-memory parallel computers. Our proposed FFT algorithm is based on the six-step FFT algorithm. The block six-step FFT algorithm improves performance by effectively utilizing the cache memory. Performance results of one-dimensional FFTs on the SGI Onyx 3400 and Sun Enterprise 6000 are reported. We successfully achieved performance of about 1929 MFLOPS on the SGI Onyx 3400 (MIPS R12000 400 MHz, 16 CPUs) and about 520 MFLOPS on the Sun Enterprise 6000 (UltraSPARC 168 MHz, 16 CPUs).