Two and three dimensional FFTS on highly parallel computers
Parallel Computing
FFTs in external or hierarchical memory
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
Real and complex fast Fourier transforms on the Fujitsu VPP 500
Parallel Computing
High Performance Communication using a Commodity Network for Cluster Systems
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Radix-4 FFT implementation using SIMD multimedia instructions
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04
Architecture independent short vector FFTs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
FFT algorithms for vector computers
Parallel Computing
Hi-index | 0.00 |
In this paper, we propose an implementation of a parallel three-dimensional fast Fourier transform (FFT) using short vector SIMD instructions on clusters of PCs. We vectorized FFT kernels using Intel's Streaming SIMD Extensions 2 (SSE2) instructions. We show that a combination of the vectorization and block three-dimensional FFT algorithm improves performance effectively. Performance results of three-dimensional FFTs on a dual Xeon 2.8 GHz PC SMP cluster are reported. We successfully achieved performance of over 5 GFLOPS on a 16-node dual Xeon 2.8 GHz PC SMP cluster.