Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Numerical Methods for Engineers with Personal Computer Applications
Numerical Methods for Engineers with Personal Computer Applications
Parallel Scientific Computing in C++ and MPI
Parallel Scientific Computing in C++ and MPI
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Cramer's rule on 2-by-2 systems
ACM SIGNUM Newsletter
Cramer's rule reconsidered or equilibration desirable
ACM SIGNUM Newsletter
Handbook of Continued Fractions for Special Functions
Handbook of Continued Fractions for Special Functions
Modelling and analysis of communication overhead for parallel matrix algorithms
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.00 |
Given that many-core architectures are becoming the mainstream framework for high performance computing, it is important to develop a performance model for many-core architectures to assist parallel algorithms design and applications performance tuning. In this paper, we propose a performance modeling technique for parallel Cooley-Tukey FFT algorithms, for an abstract many-core architecture that captures generic features and parameters of a class of real many-core architectures. We have verified our performance model on the IBM Cyclops-64 (C64) many-core architecture. The experimental results demonstrate that our model can predict the performance trend accurately, with an average relative error of 16%, when running on up to 16 cores. The average relative error rate gradually increases to 29%, when running on up to 64 cores. The experimental results also reveal that key to performance for this class of many-core architectures is using the local memory and higher radix algorithms to reduce the memory traffic requirements.