A high performance parallel algorithm for 1-D FFT

Authors:
R. C. Agarwal;F. G. Gustavson;M. Zubair
Affiliations:
IBM T.J. Watson Research Center, Yorktown Hts., NY;IBM T.J. Watson Research Center, Yorktown Hts., NY;IBM T.J. Watson Research Center, Yorktown Hts., NY
Venue:
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Year:
1994

Citing 6
Cited 10

FFT algorithms for SIMD parallel processing systems

Journal of Parallel and Distributed Computing
A radix-2 FFT on connection machine

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
FFTs in external or hierarchical memory

The Journal of Supercomputing
Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor

Journal of Parallel and Distributed Computing
Ultrahigh-performance FFTs for the CRAY-2 and CRAY Y-MP supercomputers

The Journal of Supercomputing
Public International Benchmarks for Parallel Computers

Public International Benchmarks for Parallel Computers

High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers

The Journal of Supercomputing
A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Scheduling FFT computation on SMP and multicore systems

Proceedings of the 21st annual international conference on Supercomputing
Performance without pain = productivity: data layout and collective communication in UPC

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements

IBM Journal of Research and Development
A vector-parallel FFT with a user-specifiable data distribution scheme

ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
Optimization of fast Fourier transforms on the Blue Gene/L supercomputer

HiPC'08 Proceedings of the 15th international conference on High performance computing
Large-scale FFT on GPU clusters

Proceedings of the 24th ACM International Conference on Supercomputing
Optimizing bandwidth limited problems using one-sided communication and overlap

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Portable, MPI-interoperable coarray fortran

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. We use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of an input sequence, multiplication of the transformed data by a coefficient array, and finally an inverse FFT computation of the resultant data. We show that the multidimensional formulation helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node. We implemented this kernel on the IBM SP1 and observed a performance of 1.25 GFLOPS on a 64-node machine.