An Adaptation of the Fast Fourier Transform for Parallel Processing

Authors:
Marshall C. Pease
Affiliations:
Computer Techniques Laboratory, Stanford Research Institute, Menlo Park, California
Venue:
Journal of the ACM (JACM)
Year:
1968

Citing 0
Cited 80

Finite State Model and Compatibility Theory: New Analysis Tools for Permutation Networks

IEEE Transactions on Computers
Fourier transform and convolution subroutines for the IBM 3090 Vector facility

IBM Journal of Research and Development
AT2 = O(N log4 N), T = O(log N) fast Fourier transform in a light connected 3-dimensional VLSI

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Reduced distance routing in single-state shuffle-exchange interconnection networks

SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures

IEEE Transactions on Computers
Performance analysis of the FFT algorithm on a shared-memory parallel architecture

IBM Journal of Research and Development
Best worst mappings for the omega network

IBM Journal of Research and Development
An Architecture for a Video Rate Two-Dimensional Fast Fourier Transform Processor

IEEE Transactions on Computers
The rice parallel processing testbed

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The fast fourier transform and sparse matrix computations: a study of two applications on teh HORIZON supercomputer

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Parallel algorithms for super performance

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Multilinear algebra and parallel programming

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Unified Architecture for Divide and Conquer Based Tridiagonal System Solvers

IEEE Transactions on Computers
Parallel Implementation of Multidimensional Transforms without Interprocessor Communication

IEEE Transactions on Computers
Organization of Large Scale Fourier Processors

Journal of the ACM (JACM)
Kronecker Matrices, Computer Implementation, and Generalized Spectra

Journal of the ACM (JACM)
An Augmented Content-Addressed Memory Array for Implementation With Large-Scale Integration

Journal of the ACM (JACM)
A parallel computer based on cube connected cycles for wafer scale integration

ACM '86 Proceedings of 1986 ACM Fall joint computer conference
Multithreaded algorithms for the fast Fourier transform

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Picture Processing by Computer

ACM Computing Surveys (CSUR)
Associative and Parallel Processors

ACM Computing Surveys (CSUR)
A Survey of Parallel Machine Organization and Programming

ACM Computing Surveys (CSUR)
Classification Categories and Historical Development of Circuit Switching Topologies

ACM Computing Surveys (CSUR)
Ultracomputers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
The cube-connected cycles: a versatile network for parallel computation

Communications of the ACM
A functional approach to radix-r FFTS

Progress in computer research
A functional approach to radix-r FFTS

Progress in computer research
An Efficient Architecture for the In-Place Fast Cosine Transform

Journal of VLSI Signal Processing Systems
Constant Geometry Fast Fourier Transforms on Array Processors

IEEE Transactions on Computers
Design and Analysis of Even-Sized Binary Shuffle-Exchange Networks for Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Concurrent Iterative Algorithm for Toeplitz-like Linear Systems

IEEE Transactions on Parallel and Distributed Systems
Parallel Architecture for Fast Transforms with Trigonometric Kernel

IEEE Transactions on Parallel and Distributed Systems
Concurrent Error Detection in Fast Unitary Transform Algorithms

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
An efficient architecture for the in place fast cosine transform

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Area-time complexity for VLSI

STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
On the parallel computation of local operations

STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
A state-of-the-art SIMD two-dimensional FFT array processor

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A modular memory scheme for array processing

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
A parallel 1-D FFT algorithm for the Hitachi SR8000

Parallel Computing
Fast and accurate resource estimation of automatically generated custom DFT IP cores

Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Calculation scheme based on a weighted primitive: application to image processing transforms

EURASIP Journal on Applied Signal Processing
The Computation of Window Operations on a Parallel Organized Computer A Case Study

IEEE Transactions on Computers
Parallel Processing with the Perfect Shuffle

IEEE Transactions on Computers
A Generalization of the Fast Fourier Transform

IEEE Transactions on Computers
Notes on Shuffle/Exchange-Type Switching Networks

IEEE Transactions on Computers
Data Manipulating Functions in Parallel Processors and Their Implementations

IEEE Transactions on Computers
Quotient Networks

IEEE Transactions on Computers
Interconnections Between Processors and Memory Modules Using the Shuffle-Exchange Network

IEEE Transactions on Computers
The Burroughs Scientific Processor (BSP)

IEEE Transactions on Computers
The Universality of the Shuffle-Exchange Network

IEEE Transactions on Computers
The Indirect Binary n-Cube Microprocessor Array

IEEE Transactions on Computers
The Design of a Class of Fast Fourier Transform Computers

IEEE Transactions on Computers
Serial Adders with Overflow Correction

IEEE Transactions on Computers
Parallel Permutations of Data: A Benes Network Control Algorithm for Frequently Used Permutations

IEEE Transactions on Computers
A Uniform Representation of Single-and Multistage Interconnection Networks Used in SIMD Machines

IEEE Transactions on Computers
Implementation of Permutation Functions in Illiac IV-Type Computers

IEEE Transactions on Computers
Two VLSI Structures for the Discrete Fourier Transform

IEEE Transactions on Computers
A VLSI Network for Variable Size FFT's

IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor

IEEE Transactions on Computers
Formal datapath representation and manipulation for implementing DSP transforms

Proceedings of the 45th annual Design Automation Conference
Programming the Intel 80-core network-on-a-chip terascale processor

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A versatile VLSI fast Fourier transform processor

AFIPS '84 Proceedings of the July 9-12, 1984, national computer conference and exposition
Interconnection networks: a survey and assessment

AFIPS '74 Proceedings of the May 6-10, 1974, national computer conference and exposition
Parallel FFT with Eden Skeletons

PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
A Shuffle-Exchange Network with Simplified Control

IEEE Transactions on Computers
Radix rkFFTs: matricial representation and SDC/SDF pipeline implementation

IEEE Transactions on Signal Processing
Radix-4 FFT algorithms with ordered input and output data

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
FFT algorithms for vector computers

Parallel Computing
Evaluating the performance of space plasma simulations using FPGA's

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
A hybrid parallel M-D FFT algorithm without interprocessor communication

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: digital speech processing - Volume III
Some computer organizations and their effectiveness

IEEE Transactions on Computers
Kronecker products and shuffle algebra

IEEE Transactions on Computers
Computer Generation of Hardware for Linear Digital Signal Processing Transforms

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Computer generation of streaming sorting networks

Proceedings of the 49th Annual Design Automation Conference
A transpose-free in-place SIMD optimized FFT

ACM Transactions on Architecture and Code Optimization (TACO)
High performance FFT on SGI Altix 3700

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Towards efficient arithmetic for lattice-based cryptography on reconfigurable hardware

LATINCRYPT'12 Proceedings of the 2nd international conference on Cryptology and Information Security in Latin America
Influence of memory access patterns to small-scale FFT performance

The Journal of Supercomputing
A high performance split-radix FFT with constant geometry architecture

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.10

Visualization

Abstract

A modified version of the Fast Fourier Transform is developed and described. This version is well adapted for use in a special-purpose computer designed for the purpose. It is shown that only three operators are needed. One operator replaces successive pairs of data points by their sums and differences. The second operator performs a fixed permutation which is an ideal shuffle of the data. The third operator permits the multiplication of a selected subset of the data by a common complex multiplier.If, as seems reasonable, the slowest operation is the complex multiplications required, then, for reasonably sized date sets—e.g. 512 complex numbers—parallelization by the method developed should allow an increase of speed over the serial use of the Fast Fourier Transform by about two orders of magnitude.It is suggested that a machine to realize the speed improvement indicated is quite feasible.The analysis is based on the use of the Kronecker product of matrices. It is suggested that this form is of general use in the development and classification of various modifications and extensions of the algorithm.