A vector-parallel FFT with a user-specifiable data distribution scheme

  • Authors:
  • Yusaku Yamamoto;Mitsuyoshi Igai;Ken Naono

  • Affiliations:
  • Central Research Laboratory, Hitachi Ltd, Kokubunji, Tokyo, Japan;Hitachi ULSI Technology Corp., Kokubunji, Tokyo, Japan;Central Research Laboratory, Hitachi Ltd, Kokubunji, Tokyo, Japan

  • Venue:
  • ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a 1-dimensional FFT routine for distributed-memory vector-parallel machines which provides the user with both high performance and flexibility in data distribution. Our routine inputs/outputs data using block cyclic data distribution, and the block sizes for input and output can be specified independently by the user. This flexibility is realized with the same amount of inter-processor communication as the widely used transpose algorithm and no additional overhead for data redistribution is necessary. We implemented our method on the Hitachi SR2201, a distributed-memory parallel machine with pseudovector processing nodes, and obtained 45% of the peak performance on 16 nodes when the problem size is N = 224. This performance was unchanged for a wide range of block sizes from 1 to 16.