An implementation of parallel 1-D FFT using SSE3 instructions on dual-core processors

  • Authors:
  • Daisuke Takahashi

  • Affiliations:
  • Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan

  • Venue:
  • PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the present paper, an implementation of a parallel one-dimensional fast Fourier transform (FFT) using Streaming SIMD Extensions 3 (SSE3) instructions on dual-core processors is proposed. Combination of vectorization and the block six-step FFT algorithm is shown to effectively improve performance. The performance results for one-dimensional FFTs on dual-core Intel Xeon processors are reported. We successfully achieved performance of approximately 2006MFLOPS on a dual-core Intel Xeon PC (2.8GHz, two CPUs, four cores) and approximately 3492 MFLOPS on a dual-core Intel Xeon 5150 PC (2.66GHz, two CPUs, four cores) for a 220-point FFT.