Paper: Bluestein's FFT for arbitrary N on the hypercube

Authors:
Paul N. Swarztrauber;Roland A. Sweet;William L. Briggs;Van Emden Henson;James Otto
Affiliations:
National Center for Atmospheric Research**, Boulder, CO 80307, USA;Computational Mathematics Group, Department of Mathematics, University of Colorado at Denver, Denver, CO 80204, USA;Computational Mathematics Group, Department of Mathematics, University of Colorado at Denver, Denver, CO 80204, USA;Department of Mathematics, Naval Postgraduate Scool***, Monterey, CA 93933, USA;Computational Mathematics Group, Department of Mathematics, University of Colorado at Denver, Denver, CO 80204, USA
Venue:
Parallel Computing
Year:
1991

Citing 5
Cited 1

Fourier transform and convolution subroutines for the IBM 3090 Vector facility

IBM Journal of Research and Development
Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor

Journal of Parallel and Distributed Computing
Parallel Sorting Algorithms

Parallel Sorting Algorithms
Sorting networks and their applications

AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
FFT algorithms for vector computers

Parallel Computing

A framework for architecture-level power, area, and thermal simulation and its application to network-on-chip design exploration

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The original Cooley-Tukey FFT was published in 1965 and presented for sequences with length N equal to a power of two. However, in the same paper they noted that their algorithm could be generalized to composite N in which the length of the sequence was a product of small primes. In 1967, Bergland presented an algorithm for composite N and variants of his mixed radix FFT are currently in wide use. In 1968, Bluestein presented an FFT for arbitrary N including large primes. However, for composite N, Bluestein's FFT was not competitive with Bergland's FFT. Since it is usually possible to select a composite N, Bluestein's FFT did not receive much attention. Nevertheless because of its minimal communication requirements, the Bluestein FFT may be the algorithm of choice on multiprocessors, particularly those with the hypercube architecture. In contrast to the mixed radix FFT, the communication pattern of the Bluestein FFT maps quite well onto the hypercube. With P = 2^d processors, an ordered Bluestein FFT requires 2d communication cycles with packet length N/2P which is comparable to the requirements of a power of two FFT. For fine-grain computations, the Bluestein FFT requires 20log"2N computational cycles. Although this is double that required for a mixed radix FFT, the Bluestein FFT may nevertheless be preferred because of its lower communication costs. For most values of N it is also shown to be superior to another alternative, namely parallel multiplication.