Performance analysis of Cooley-Tukey FFT algorithms for a many-core architecture

  • Authors:
  • Long Chen;Guang R. Gao

  • Affiliations:
  • University of Delaware, Newark, Delaware;University of Delaware, Newark, Delaware

  • Venue:
  • SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given that many-core architectures are becoming the mainstream framework for high performance computing, it is important to develop a performance model for many-core architectures to assist parallel algorithms design and applications performance tuning. In this paper, we propose a performance modeling technique for parallel Cooley-Tukey FFT algorithms, for an abstract many-core architecture that captures generic features and parameters of a class of real many-core architectures. We have verified our performance model on the IBM Cyclops-64 (C64) many-core architecture. The experimental results demonstrate that our model can predict the performance trend accurately, with an average relative error of 16%, when running on up to 16 cores. The average relative error rate gradually increases to 29%, when running on up to 64 cores. The experimental results also reveal that key to performance for this class of many-core architectures is using the local memory and higher radix algorithms to reduce the memory traffic requirements.