Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
FFT and Convolution Performance in Image Filtering on GPU
IV '06 Proceedings of the conference on Information Visualization
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Cache simulator based on GPU acceleration
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
GCSim: A GPU-Based Trace-Driven Simulator for Multi-level Cache
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Comparison of design and performance of snow cover computing on GPUs and multi-core processors
WSEAS Transactions on Information Science and Applications
Hi-index | 0.00 |
The growing computational power of modern graphics processing units is making them very suitable for general purpose computing. These commodity processors operate generally as parallel SIMD platforms and, among other factors, the effectiveness of the codes is subject to a right exploitation of the underlying memory hierarchy. This paper deals with the implementation of the Fast Fourier Transform on a novel graphics architecture offered recently by NVIDIA. Such an implementation takes into consideration memory reference locality issues, that are crucial when pursuing a high degree of parallelism, that is, a good occupancy of the processing elements. The proposed implementation has been tested and compared to the manufacturer's own implementation.