Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Hi-index | 0.00 |
The diversity of architectural designs and the programming styles of emerging computational hardware have created a wide search spectrum for the performance optimisation in the development of next generation high-performance software. Preliminary performance evaluations PPE on various computational platforms are essential to provide useful guidelines for proper software design choices. In this paper, we study the performance of the numerical kernels of the determinant quantum Monte Carlo DQMC simulations for two popular computing processors: multi-core CPU and GPU. Two algorithms, the Loh's method and the SOF algorithm, with different implementations and problem configurations, are tested to explore the hardware characteristics, such as scalability and processor utilisation. The results of this PPE that show the favoured algorithms and applicable parameter ranges on those two platforms can provide useful technical information not only for this particular computation, but also for all applications that use similar computation kernels.