An Adaptation of the Fast Fourier Transform for Parallel Processing
Journal of the ACM (JACM)
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
Auto-tuning 3-D FFT library for CUDA GPUs
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
FFT Implementation on a Streaming Architecture
PDP '11 Proceedings of the 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing
A quantitative performance analysis model for GPU architectures
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Hi-index | 0.00 |
Modern GPUs (Graphics Processing Units) offer very high computing power at relative low cost. To take advantage of their computing resources and develop efficient implementations is essential to have certain knowledge about the architecture and memory hierarchy. In this paper, we use the FFT (Fast Fourier Transform) as a benchmark tool to analyze different aspects of GPU architectures, like the influence of the memory access pattern or the impact of the register pressure. The FFT is a good tool for performance analysis because it is used in many digital signal processing applications and has a good balance between computational cost and memory bandwidth requirements.