A radix-2 FFT on connection machine

Authors:
S. L. Johnsson;R. L. Krawitz;R. Frye;D. MacDonald
Affiliations:
Thinking Machines Corp., 245 First Street, Cambridge, MA;Thinking Machines Corp., 245 First Street, Cambridge, MA;Thinking Machines Corp., 245 First Street, Cambridge, MA;Thinking Machines Corp., 245 First Street, Cambridge, MA
Venue:
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Year:
1989

Citing 4
Cited 1

A Unified Approach to a Class of Data Movements on an Array Processor

IEEE Transactions on Computers
Communication effect basic linear algebra computations on hypercube architectures

Journal of Parallel and Distributed Computing
Discrete-time signal processing

Discrete-time signal processing
Combinatorial Algorithms: Theory and Practice

Combinatorial Algorithms: Theory and Practice

A high performance parallel algorithm for 1-D FFT

Proceedings of the 1994 ACM/IEEE conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a radix-2 FFT implementation on the Connection Machine. The FFT implementation pipelines successive FFT stages to make full use of the communication capability of the network interconnecting processors, when there are multiple elements assigned to each processor. Of particular interest in distributed memory architectures such as the Connection Machine is the allocation of twiddle factors to processors. We show that with a consecutive data allocation scheme and normal order input a decimation-in-time FFT results in a factor of log2N less storage for twiddle factors than a decimation-in-frequency FFT for N processors. Similarly, with consecutive storage and bit-reversed input a decimation-in-frequency FFT requires a factor of log2N less storage than a decimation-in-time FFT. The performance of the local FFT has a peak of about 3 Gflops/s. The “global” FFT has a peak performance of about 1.7 Gflops/s.