FFT algorithms for vector computers

Authors:
Paul N. Swarztrauber
Affiliations:
-
Venue:
Parallel Computing
Year:
1984

Citing 4
Cited 17

An Adaptation of the Fast Fourier Transform for Parallel Processing

Journal of the ACM (JACM)
Numerical Analysis: A fast fourier transform algorithm for real-valued series

Communications of the ACM
A Generalization of the Fast Fourier Transform

IEEE Transactions on Computers
Fast Fourier Transforms: for fun and profit

AFIPS '66 (Fall) Proceedings of the November 7-10, 1966, fall joint computer conference

A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs

PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
On a faster parallel implementation of the split-step Fourier method

Parallel Computing
A parallel FFT algorithm for transputer networks

Parallel Computing
Paper: Bluestein's FFT for arbitrary N on the hypercube

Parallel Computing
Multi-FFT Vectorization for the Cell Multicore Processor

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Applications of FFT and structured matrices

Algorithms and theory of computation handbook
A Fourth Order Hermitian Box-Scheme with Fast Solver for the Poisson Problem in a Square

Journal of Scientific Computing
An efficient parallel solution of complex toeplitz linear systems,

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
A hybrid MPI/OpenMP implementation of a parallel 3-d FFT on SMP clusters

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
A parallel solution of hermitian toeplitz linear systems,

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Automatically tuned FFTs for bluegene/l's double FPU

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
An efficient and stable parallel solution for non-symmetric toeplitz linear systems

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
An implementation of parallel 3-d FFT using short vector SIMD instructions on clusters of PCs

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
High performance FFT on SGI Altix 3700

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
An implementation of parallel 2-d FFT using intel AVX instructions on multi-core processors

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The adaptation of the Cooley-Tukey, the Pease and the Stockham FFT's to vector computers is discussed. Each of these algorithms computes the same result namely, the discrete Fourier transform. They differ only in the way that intermediate computations are stored. Yet it is this difference that makes one or the other more appropriate depending on the application. This difference also influences the computational efficiency on a vector computer and motivates the development of methods to improve efficiency. Each of the FFT's is defined rigorously by a short expository FORTRAN program which provides the basis for discussions about vectorization. Several methods for lengthening vectors are discussed, including the case of multiple and multi-dimensional transforms where M sequences of length N can be transformed as a single sequence of length MN using a 'truncated' FFT. The implementation of an in place FFT on a computer with memory-to-memory architecture is made possible by in place matrix-vector multiplication.