Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
This paper describes our winning submission for the Absolute Performance category of the MEMOCODE 2009 Design Contest. We show that our GPGPU-based design achieves performance within a factor of four of theoretical maximum performance for the implemented algorithm. This result was reached after a short design-cycle of 2 man-days, which indicates that the NVIDIA CUDA platform allows for rapid development and optimization of applications that make substantial use of all available GPGPU computing resources. We also analyze the maximum theoretical performance of alternative computing systems that could have been used to implement the algorithm.