Fourier transform and convolution subroutines for the IBM 3090 Vector facility
IBM Journal of Research and Development
Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor
Journal of Parallel and Distributed Computing
Parallel Sorting Algorithms
Sorting networks and their applications
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
FFT algorithms for vector computers
Parallel Computing
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Hi-index | 0.00 |
The original Cooley-Tukey FFT was published in 1965 and presented for sequences with length N equal to a power of two. However, in the same paper they noted that their algorithm could be generalized to composite N in which the length of the sequence was a product of small primes. In 1967, Bergland presented an algorithm for composite N and variants of his mixed radix FFT are currently in wide use. In 1968, Bluestein presented an FFT for arbitrary N including large primes. However, for composite N, Bluestein's FFT was not competitive with Bergland's FFT. Since it is usually possible to select a composite N, Bluestein's FFT did not receive much attention. Nevertheless because of its minimal communication requirements, the Bluestein FFT may be the algorithm of choice on multiprocessors, particularly those with the hypercube architecture. In contrast to the mixed radix FFT, the communication pattern of the Bluestein FFT maps quite well onto the hypercube. With P = 2^d processors, an ordered Bluestein FFT requires 2d communication cycles with packet length N/2P which is comparable to the requirements of a power of two FFT. For fine-grain computations, the Bluestein FFT requires 20log"2N computational cycles. Although this is double that required for a mixed radix FFT, the Bluestein FFT may nevertheless be preferred because of its lower communication costs. For most values of N it is also shown to be superior to another alternative, namely parallel multiplication.