High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers

  • Authors:
  • Daisuke Takahashi;Yasumasa Kanada

  • Affiliations:
  • Computer Centre, University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, 113-8658, Japan daisuke@pi.cc.u-tokyo.ac.jp;Computer Centre, University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, 113-8658, Japan kanada@pi.cc.u-tokyo.ac.jp

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 parallel 1-D complex FFT algorithms. In our parallel FFT algorithms, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order.We also show that the suitability of a parallel FFT algorithm is machine-dependent because of the differences in the architecture of the processor elements in distributed-memory parallel computers. Experimental results of 2^p3^q5^r point FFTs on distributed-memory parallel computers, HITACHI SR2201 and IBM SP2 are reported. We succeeded to get performances of about 130 GFLOPS on a 1024PE HITACHI SR2201 and about 1.25 GFLOPS on a 32PE IBM SP2.