Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
FFTs in external or hierarchical memory
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
Ultrahigh-performance FFTs for the CRAY-2 and CRAY Y-MP supercomputers
The Journal of Supercomputing
Pseudo vector processor based on register-windowed superscalar pipeline
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Real and complex fast Fourier transforms on the Fujitsu VPP 500
Parallel Computing
ScaLAPACK user's guide
A high performance parallel algorithm for 1-D FFT
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Hi-index | 0.00 |
We propose a 1-dimensional FFT routine for distributed-memory vector-parallel machines which provides the user with both high performance and flexibility in data distribution. Our routine inputs/outputs data using block cyclic data distribution, and the block sizes for input and output can be specified independently by the user. This flexibility is realized with the same amount of inter-processor communication as the widely used transpose algorithm and no additional overhead for data redistribution is necessary. We implemented our method on the Hitachi SR2201, a distributed-memory parallel machine with pseudovector processing nodes, and obtained 45% of the peak performance on 16 nodes when the problem size is N = 224. This performance was unchanged for a wide range of block sizes from 1 to 16.