Finite State Model and Compatibility Theory: New Analysis Tools for Permutation Networks
IEEE Transactions on Computers
Fourier transform and convolution subroutines for the IBM 3090 Vector facility
IBM Journal of Research and Development
AT2 = O(N log4 N), T = O(log N) fast Fourier transform in a light connected 3-dimensional VLSI
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Reduced distance routing in single-state shuffle-exchange interconnection networks
SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
IEEE Transactions on Computers
Performance analysis of the FFT algorithm on a shared-memory parallel architecture
IBM Journal of Research and Development
Best worst mappings for the omega network
IBM Journal of Research and Development
An Architecture for a Video Rate Two-Dimensional Fast Fourier Transform Processor
IEEE Transactions on Computers
The rice parallel processing testbed
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Parallel algorithms for super performance
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Multilinear algebra and parallel programming
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Unified Architecture for Divide and Conquer Based Tridiagonal System Solvers
IEEE Transactions on Computers
Parallel Implementation of Multidimensional Transforms without Interprocessor Communication
IEEE Transactions on Computers
Organization of Large Scale Fourier Processors
Journal of the ACM (JACM)
Kronecker Matrices, Computer Implementation, and Generalized Spectra
Journal of the ACM (JACM)
An Augmented Content-Addressed Memory Array for Implementation With Large-Scale Integration
Journal of the ACM (JACM)
A parallel computer based on cube connected cycles for wafer scale integration
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
Multithreaded algorithms for the fast Fourier transform
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Picture Processing by Computer
ACM Computing Surveys (CSUR)
Associative and Parallel Processors
ACM Computing Surveys (CSUR)
A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
Classification Categories and Historical Development of Circuit Switching Topologies
ACM Computing Surveys (CSUR)
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
The cube-connected cycles: a versatile network for parallel computation
Communications of the ACM
A functional approach to radix-r FFTS
Progress in computer research
A functional approach to radix-r FFTS
Progress in computer research
An Efficient Architecture for the In-Place Fast Cosine Transform
Journal of VLSI Signal Processing Systems
Constant Geometry Fast Fourier Transforms on Array Processors
IEEE Transactions on Computers
Design and Analysis of Even-Sized Binary Shuffle-Exchange Networks for Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Concurrent Iterative Algorithm for Toeplitz-like Linear Systems
IEEE Transactions on Parallel and Distributed Systems
Parallel Architecture for Fast Transforms with Trigonometric Kernel
IEEE Transactions on Parallel and Distributed Systems
Concurrent Error Detection in Fast Unitary Transform Algorithms
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
An efficient architecture for the in place fast cosine transform
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
On the parallel computation of local operations
STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
A state-of-the-art SIMD two-dimensional FFT array processor
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A modular memory scheme for array processing
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
A parallel 1-D FFT algorithm for the Hitachi SR8000
Parallel Computing
Fast and accurate resource estimation of automatically generated custom DFT IP cores
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Calculation scheme based on a weighted primitive: application to image processing transforms
EURASIP Journal on Applied Signal Processing
The Computation of Window Operations on a Parallel Organized Computer A Case Study
IEEE Transactions on Computers
Parallel Processing with the Perfect Shuffle
IEEE Transactions on Computers
A Generalization of the Fast Fourier Transform
IEEE Transactions on Computers
Notes on Shuffle/Exchange-Type Switching Networks
IEEE Transactions on Computers
Data Manipulating Functions in Parallel Processors and Their Implementations
IEEE Transactions on Computers
IEEE Transactions on Computers
Interconnections Between Processors and Memory Modules Using the Shuffle-Exchange Network
IEEE Transactions on Computers
The Burroughs Scientific Processor (BSP)
IEEE Transactions on Computers
The Universality of the Shuffle-Exchange Network
IEEE Transactions on Computers
The Indirect Binary n-Cube Microprocessor Array
IEEE Transactions on Computers
The Design of a Class of Fast Fourier Transform Computers
IEEE Transactions on Computers
Serial Adders with Overflow Correction
IEEE Transactions on Computers
Parallel Permutations of Data: A Benes Network Control Algorithm for Frequently Used Permutations
IEEE Transactions on Computers
A Uniform Representation of Single-and Multistage Interconnection Networks Used in SIMD Machines
IEEE Transactions on Computers
Implementation of Permutation Functions in Illiac IV-Type Computers
IEEE Transactions on Computers
Two VLSI Structures for the Discrete Fourier Transform
IEEE Transactions on Computers
A VLSI Network for Variable Size FFT's
IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor
IEEE Transactions on Computers
Formal datapath representation and manipulation for implementing DSP transforms
Proceedings of the 45th annual Design Automation Conference
Programming the Intel 80-core network-on-a-chip terascale processor
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A versatile VLSI fast Fourier transform processor
AFIPS '84 Proceedings of the July 9-12, 1984, national computer conference and exposition
Interconnection networks: a survey and assessment
AFIPS '74 Proceedings of the May 6-10, 1974, national computer conference and exposition
Parallel FFT with Eden Skeletons
PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
A Shuffle-Exchange Network with Simplified Control
IEEE Transactions on Computers
Radix rkFFTs: matricial representation and SDC/SDF pipeline implementation
IEEE Transactions on Signal Processing
Radix-4 FFT algorithms with ordered input and output data
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
FFT algorithms for vector computers
Parallel Computing
Evaluating the performance of space plasma simulations using FPGA's
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
A hybrid parallel M-D FFT algorithm without interprocessor communication
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: digital speech processing - Volume III
Some computer organizations and their effectiveness
IEEE Transactions on Computers
Kronecker products and shuffle algebra
IEEE Transactions on Computers
Computer Generation of Hardware for Linear Digital Signal Processing Transforms
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Computer generation of streaming sorting networks
Proceedings of the 49th Annual Design Automation Conference
A transpose-free in-place SIMD optimized FFT
ACM Transactions on Architecture and Code Optimization (TACO)
High performance FFT on SGI Altix 3700
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Towards efficient arithmetic for lattice-based cryptography on reconfigurable hardware
LATINCRYPT'12 Proceedings of the 2nd international conference on Cryptology and Information Security in Latin America
Influence of memory access patterns to small-scale FFT performance
The Journal of Supercomputing
A high performance split-radix FFT with constant geometry architecture
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.10 |
A modified version of the Fast Fourier Transform is developed and described. This version is well adapted for use in a special-purpose computer designed for the purpose. It is shown that only three operators are needed. One operator replaces successive pairs of data points by their sums and differences. The second operator performs a fixed permutation which is an ideal shuffle of the data. The third operator permits the multiplication of a selected subset of the data by a common complex multiplier.If, as seems reasonable, the slowest operation is the complex multiplications required, then, for reasonably sized date sets—e.g. 512 complex numbers—parallelization by the method developed should allow an increase of speed over the serial use of the Fast Fourier Transform by about two orders of magnitude.It is suggested that a machine to realize the speed improvement indicated is quite feasible.The analysis is based on the use of the Kronecker product of matrices. It is suggested that this form is of general use in the development and classification of various modifications and extensions of the algorithm.