A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs

Authors:
Daisuke Takahashi;Taisuke Boku;Mitsuhisa Sato
Affiliations:
-;-;-
Venue:
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Year:
2002

Citing 11
Cited 4

FFTs in external or hierarchical memory

The Journal of Supercomputing
Computational frameworks for the fast Fourier transform

Computational frameworks for the fast Fourier transform
A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing

Numerische Mathematik
The Future Fast Fourier Transform?

SIAM Journal on Scientific Computing
A high performance parallel algorithm for 1-D FFT

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Automatic Performance Tuning in the UHFFT Library

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Blocking Algorithm for FFT on Cache-Based Processors

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
High Performance Communication using a Commodity Network for Cluster Systems

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
The Fastest Fourier Transform in the West

The Fastest Fourier Transform in the West
High Performance FFT Algorithms for Cache-Coherent Multiprocessors

International Journal of High Performance Computing Applications
FFT algorithms for vector computers

Parallel Computing

Parallel implementation of multiple-precision arithmetic and 2,576,980,370,000 decimal digits of π calculation

Parallel Computing
Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition

Parallel Computing
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a blocking algorithm for a parallel one-dimensional fast Fourier transform (FFT) on clusters of PCs. Our proposed parallel FFT algorithm is based on the six-step FFT algorithm. The six-step FFT algorithm can be altered into a block nine-step FFT algorithm to reduce the number of cache misses. The block nine-step FFT algorithm improves performance by utilizing the cache memory effectively. We use the block nine-step FFT algorithm to design the parallel one-dimensional FFT algorithm. In our proposed parallel FFT algorithm, since we use cyclic distribution, all-to-all communication is required only once. Moreover, the input data and output data are both can be given in natural order. We successfully achieved performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.