An Adaptation of the Fast Fourier Transform for Parallel Processing
Journal of the ACM (JACM)
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Implementation of Fast Radix 2 Transforms on Array Processors
IEEE Transactions on Computers
Parallel Processing with the Perfect Shuffle
IEEE Transactions on Computers
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor
IEEE Transactions on Computers
IBM Journal of Research and Development
Dynamic and static load scheduling performance on a NUMA shared memory multiprocessor
ICS '91 Proceedings of the 5th international conference on Supercomputing
A New Fast Discrete Fourier Transform
Journal of VLSI Signal Processing Systems
Parallel Implementation of Multidimensional Transforms without Interprocessor Communication
IEEE Transactions on Computers
Constant Geometry Fast Fourier Transforms on Array Processors
IEEE Transactions on Computers
A Parallel Algorithm for 2-D DFT Computation with No Interprocessor Communication
IEEE Transactions on Parallel and Distributed Systems
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Two parallel implementations for one dimension FFT on symmetric multiprocessors
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Revisiting Cramer's rule for solving dense linear systems
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Parallel algorithms for some algebraic operations on polynomial equations
Mathematical and Computer Modelling: An International Journal
An FFT performance model for optimizing general-purpose processor architecture
Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Hi-index | 14.99 |
We present here a study of parallelization of the Cooley-Tukey radix two FFT algorithm for MIMD (nonvector) architectures. Parallel algorithms are presented for one and multidimensional Fourier transforms. From instruction traces obtained by executing Fortran kernels derived from our algorithms, we determined the precise instructions to be executed by each processor in the parallel system. We used these instruction races to predict the performance of the IBM Research Parallel Processing Prototype, RP3, as a computer of FFT's. Our performance results are depicted in graphs included in this paper.